Unit Iii - Designing and Developing Assessments: Let's Read These

UNIT III – DESIGNING AND DEVELOPING ASSESSMENTS
Quality assessment takes center stage on the learning process. In fact, it is a vital
component of the instructional process. The evaluation and judgment of a teacher on student
performance are based on information obtained in using assessment instruments whose quality is
of paramount importance. Every teacher should have the necessary skill to develop quality test
items. It is a teacher’s mandate to procure for the learners the optimum evaluation strategy.
Teachers who create effective tests, develop remedial instruction and allow students
several attempts to elicit success can improve their teaching method and facilitate student
learning. When instructional process incorporates effective classroom assessments so as to make
them the central feature in student learning, both students and teachers derive unlimited benefits.
Let’s read these:

Selecting proper assessment techniques is, among others an intimidating endeavor of a beginning
teacher. The method and quality of the assessment instrument are essential since the evaluation and
judgement that you will render on your students are based on data you obtain using these instruments.
These qualities are so important in assessment methods since they are indicative of your students’ extent
of learning. If there is dearth of expected qualities, evaluation and assessment will be perceived as doubtful.
instructional goals and objectives will be ambiguous.
The success of the teaching-learning process veers toward the accountability earmarked from
assessment in a classroom atmosphere. The results are two-pronged: first, how the learners studied well to
hurdle the subject or course, and second, how the teachers conducted effective instruction. Teachers resort
to summative tests to objectively measure student performance, a method acceptable to the academe and
other concerned parties. To be able to create effective tests, you need to read the following information so
that you will be guided in developing classroom-based tests for valid measurement of how well a student
academically fared.
Features of a Properly Accomplished Test:

Teachers usually receive complaints or comments from students regarding assessments, for
example, test coverage. Some tests may not be covered as to content. Students may not have wholly
studied the subject matter. As a result, the assessment tools are complicated and unaligned with action
verbs in the learning outcomes.
In order to ensure high quality assessment, validity should be in tandem with reliability. Doubts are
cast on reliability if inconsistencies result as the tests are being administered over varying time periods,
sample questions or sample groups.
The tandem is highlighted when collating information or evidence about student achievement.
Santos (2007), De Guzman (2015) and Balagtas (2020) expound the main characteristics of a good test.
1. Validity
The word "valid" is derived from the Latin validus, meaning strong. In view of assessment, it is deemed
valid if it measures what it is supposed to. Validity claims as factual to measure. Your assessment is valid if
it measures your learner’s actual knowledge and performance. For instance, a test of reading
comprehension does not require mathematical ability.
Ways to Establish Validity Description
Face Validity Validity that relies on the physical attributes of a test
When the test is presented well, administered well, free of
errors
Content Validity The extent an evaluation procedure adequately represents

content of the assessment domain being tested
Predictive Validity A measure should predict a future criterion

Example is an entrance exam predicting the grades of the
students after the first semester
Construct Validity The components or factors of the test should contain items
that are correlated
Concurrent Validity Two or more measures of the same characteristics are
present for each examinee
Convergent Validity Components or factors of a test are hypothesized to
accomplish a positive correlation
Divergent Validity Components or factors of a test are hypothesized to effect a
negative correlation
Factors which affect the validity of test scores (Ramadan, 2018):

A. Factors in the test:
1. Vague test directions
2. Difficulty of the reading vocabulary and flaw in sentence structure
3. Too easy or too difficult test items
4. Ambiguous statements
5. Inappropriate test items for measuring a particular outcome.
6. Insufficient time provided to take the test
7. The test is so concise
8. Test items are not scaled in the level of difficulty from “easy” to “difficult”
B. Factors in test administration and scoring:

1. Unfair response to students, who individually ask for clarification
2. Cheating during testing
3. Unreliable subjective scoring of essay type answers
4. Insufficient time to finish the examination
5. Learner(s) experience malaise before and/or during the test
C. Factors related to students:

1. Test anxiety of the students
2. Physical and psychological state of the student
2. Reliability - The reliability of an assessment method refers to its consistency. It is also a term
synonymous with dependability or stability. It is the extent to which an assessment tool produces a stable
and consistent result.
Types of Reliability What it is? How do you establish it?
You have a test, and you need to administer it

Test-Retest A measure to determine one time to a group of examinees. Administer it
the stability of test results again at another time to the same group of
examinees.
A measure of equivalence There are two versions of a test. The items need
Parallel Forms or comparative analysis to exactly measure the same skill. Administer one
form at one time and the other form to another
time to the “same” group of participants.
This procedure is used to determine the
Inter-Rater consistency of multiple raters when using rating
scales and rubrics to judge performance samples,
A measure of agreement
essays, portfolios, etc. The reliability here refers
to the similar or consistent ratings provided by
more than one rater when they use an
assessment instrument
A measure of how You correlate the performance on each item with
Internal Consistency consistently each item the overall performance across participants
measures the same
underlying construct
Ways to improve reliability of assessment results
a. Utilize enough number of test items, longer tests are more credible
b. Create tests with the correct level of difficulty
c. Employ impartial raters or observers who can give reasonably the same or almost similar scores on
performance
d. Be sure that there ample time to finish the assessment task
e. Focus on the the careful formulation of test questions
f. Conduct regular item analysis to improve ambiguous or poorly answered questions.
3. Practicality and Efficiency - Practical test is a test that is developed and administered within the
available time and with available resources. In other Moreover, a test should be easy to design, to
administer, to mark and to interpret as to results. Efficiency, in this context, refers to the development,
administration and grading of assessment with the least effort and resources.
4. Fairness. The fairness of a test refers to freedom from any biases. Your students must know exactly what
the learning targets are and what method of assessment will be used. They have to be informed how their
progress will be evaluated in order to make strategies and perform optimally.
Other aspects of fairness include:
1. Opportunity to learn further;
2. Pre-requisite knowledge and skills;
3. Avoidance of student stereotyping
4. Avoidance of bias in assessment procedures; and
5. Accommodating special needs and requirements
Learning Target and Assessment Method Match

De Guzman, et al., (2015) defined learning target as a description of performance that includes what
learners should know and be able to do. This definition is similar to that of a learning outcome. In other
words, learning targets provide students with a roadmap along which points to their destination and what
to expect upon reaching it.
Learning targets should comply with the standards prescribed by a program or level and should
align with the instructional or learning objectives of a subject or course. Balagtas, et al (2020). Simply put,
teachers must be cognizant of the learning targets of the lesson prior to classroom instruction. Without the
alignment between learning targets and learning activities/assessments, students will spend time on
activities, assignments and assessments that stray from intended goals. If the objective is to “defend" an
idea, but the assessment used is a multiple-choice quiz, students cannot defend the idea with proper skills.
What is taught in the classroom instruction and subsequently assessed should be aligned with the
learning targets of a lesson. When assessment is aligned with instruction, both students and teachers
benefit. There is a better chance for learners to learn more because instruction is focused assessed
appropriately. Teachers are also able to focus, making the best use of their time. Because assessment
involves real learning, they can integrate it into their daily classroom activities.
Types of Learning Targets

Chappuis, Stiggins, Chappuis, & Arter (2012) classify learning targets into five categories:
knowledge, reasoning, skill, product and disposition.
Types of Learning Description Examples
Targets
Knowledge targets Refers to factual information, You can identify and describe the
procedural knowledge and conceptual elements of design in a work of art.
understanding that strengthen each
discipline.
Reasoning targets Specify the thought processes students You can evaluate the quality of your
are to learn to do well within a range work in order to improve it.
of subjects.
Skill targets Use of a variety knowledge and/or You can use an air brush to create
reasoning to perform or demonstrate different effects.
physical skills
Product targets Use of knowledge, reasoning and skills You can create a still life oil painting.
in creating a fixed or tangible product
Disposition Targets Refer to attitudes, motivations, and You question the validity of various
interests that affect students’ positions including your own.
approaches to learning.
Appropriate Methods of Assessment

Once learning targets are clearly set, you can easily determine the appropriate assessment method.
McMillan (2007) as cited by De Guzman et al (2015) prepared a scorecard as a guide on how well a
particular assessment method measures each level of learning. The table below shows the relative strength
of each assessment method in measuring different learning targets.
Table 1. Learning Targets and Assessment Methods (McMillan 2007)
Assessment Methods
Learning Targets Selected-response Essay Performance Oral Observation Student Self-
and Brief- Tasks questioning assessment
constructed
response
Knowledge and
Simple 5 4 3 4 3 3
Understanding
Deep Understanding
and Reasoning
2 5 4 4 2 3
Skills 1 3 5 2 5 3
Products 1 1 5 2 4 4
Affects 1 2 4 4 4 5
Note: Higher numbers indicate better matches (e.g. 5 = excellent, 1 =poor)
Other support Materials Available:

For this lesson, the following materials are available from your course professor.
1. A PowerPoint presentation on the topic Learning Target and Assessment Method
2. https://study.com/academy/lesson/matching-assessment-items-to-learning-objectives.html
Preparing a Table of Specifications

A Table of Specifications or (TOS) is a test map that guides the teacher in constructing a test. It is a table
that maps out the test objectives, contents, or topics covered by the test, the levels of cognitive behavior to
be measured; the distribution of test items, number placement and weights of test items and the test
format. It helps ensure that the course’s intended outcomes, assessments and instructions are aligned.
Importance of TOS:
1. Ensures that the instructional objectives and what the test captures match
2. Ensures that the test developer will not overlook details that are considered essential to a good test
3. Makes developing a test easier and more efficient
4. Ensures that the test will sample all important content areas and processes
5. Is useful in planning and organizing
6. Offers an opportunity for teachers and students to clarify achievement expectations
Steps in Developing a Table of Specifications

1. Determine the objectives of the test. There are 3 types of objectives: cognitive, affective and
psychomotor. When planning for assessment, choose only the objectives that can be best captured
by a written test. There are objectives that are not meant for a written test. For example, if you test
the psychomotor domain, it is better to do a performance-based assessment. Those that require
demonstration or creation of something tangible like projects would also be more appropriately
measured by performance-based assessment. For a written test, you can consider cognitive
objectives that could be measured using common formats for testing.
2. Determine the coverage of the test. Only topics or contents that have been discussed in class should
be included in the test.
3. Calculate the weight for each topic. The weight assigned per topic in the test is based on the time
spent to cover each topic during instruction. The percentage of time for a topic in a test is
determined by dividing the time spent for that topic during instruction by the total amount of time
spent for all topics covered in the test.
4. Determine the number of items for the whole test. As a general rule, students are given 30-60
seconds for each item in test formats with choices. For a one- hour class, this means that the test
should not exceed 60 items. However, you need also to give time for test paper distribution and
giving instructions, the number of items should be less, maybe just 50 items.
5. Determine the number of items per topic. To determine the number of items to be included in the
test, the weights per topic are considered.
Simply remember this equation:

Number of items = no. of hrs spent in teaching the topic x total number of items of the test
total amount of time spent for all topics
Formats of TOS
1. One-way TOS. A one-way TOS maps out the content or topic test objectives, number of hours spent,
format number and placement of items. A one-way TOS cannot ensure that all levels of cognitive behaviors
that should have been developed by the course are covered in the test.
Sample 1. One-Way Table of Specifications
Time Spent on Percent of Class Number
Topics Test Placement
Topic (in hours) Time on Topic of Items
Selection and Organization of Content 6 22.2% 18 1-18
Selection and Use of Teaching

3 11.1% 9 19 - 27
Strategies
Different Approaches and Methods 12 44.4% 35 28 - 62
Selection and Use of Instructional
3 11.1% 9 63-71
Materials
Classroom Management 3 11.1% 9 72-80
27 99.9% 80
2. Two-Way TOS. A two-way TOS reflects not only the content, time spent, and the number of items but
also the levels of cognitive behavior targeted per test content. One advantage of this format is that it allows
one to see the levels of cognitive skills and dimensions of knowledge that are emphasized by the test.
Sample 2. Two-Way Table of Specifications
Item Specification
Topic No. of % No. of R U Ap An E C

Hours Items
21st Century Assessment 2 20% 4 1-3 4-5 6-7
Types of Assessment 3 30% 6 8-10 11-13
Nature of Performance-Based 5 50% 10 14-15 16-18 19-20
Assessment
Total 10 100% 20
Assessment Tools Development

The previous lesson has familiarized you with the initial process in developing classroom tests, let
us now discuss the different methods by which such assessment can be gauged. Years of experience in
school have introduced you to various types of formal and informal tests. To enhance your skills in drafting
effective test items for a particular test format, you must be familiar with such common tests formats. Let
us see how much you remember them.
Categories and Formats of Traditional Tests
As cited by Balagtas, M. (2015), traditional tests fall into two general classifications:
1. Selected-response type, require learners to choose the correct answer from several choices. Selected-
Response Tests include:
a. Multiple Choice Test - it is the most commonly used format in formal testing and typically consists
of a stem (problem), one correct or best alternative (correct answer), and three or more incorrect
alternatives (distractors)
b. True-False or Alternate Response Test - it generally consists of a statement and deciding if the
statement is true (accurate/correct) or false (inaccurate/incorrect)
c. Matching-Type Test - it consists of two sets of items to be matched with each other based on a
specified attribute.
2. Constructed-response type - it requires learners to supply answers to a given question or problem.

a. Short Answer Test - it consists off open-ended questions or incomplete sentences that require
learners to write the correct answer which may consist of a single word or a short phrase. This
includes the following sub-types:
a.1 Completion - it consists of incomplete statements that require the learners to fill in the
blanks with the correct word or phrase.
a.2 Identification - it consists of statements that summon the learners to identify or recall the
terms/concepts, people, places or events being described.
a.3 Enumeration - It directs the learners to list down all possible answers to the question.
b. Essay Test - it consists of problems/questions that require learners to compose or construct
written responses, usually long ones with several paragraphs.
c. Problem-Solving Test - It consists of problems/questions that require learners to solve problems in
quantitative or qualitative settings using knowledge and skills in mathematical concepts and
procedures, and/or other higher-order cognitive skills.
FIGURE 1. Types of Objective tests
General Guidelines in Choosing Appropriate Test Format

How can you design fair, yet challenging tests that accurately gauge student learning? To guide you
on choosing the appropriate test format, you should ask the following questions:
1. What are the objectives or desired learning outcomes lesson/unit?
2. What level of thinking is to be assessed (i.e., remembering, understanding, applying, analyzing,
evaluating or creating)?
3. Is the test matched or aligned with the course’s desired learning outcomes and the course contents
or learning activities?
4. Are the tests realistic to the students?
Test Item Formulation
True-False Test
True or false items are used to measure learners’ ability to identify whether a statement or
proposition is correct/true or incorrect/false. A learner who knows nothing of the content of the test would
have 50% chance of getting the correct answer by sheer guess work. A modified true-false test can offset
the effect of guessing by requiring learners to explain their answer and to disregard a correct answer if the
explanation is incorrect.
Here are some rules of thumb in constructing true-false items as cited by Balagtas, et al (2020),
Santos, et al (2007) and de Guzman et al (2015):
1. Include items that are completely true or completely false.
Faulty: The presidential system of government, where the president is only the head of state or
government, is adopted by the United States, Chile, Panama and South Korea.
Good: The presidential system, where the president is only the head of state or government, is Chile.
2. Avoid lifting statements from the textbook and other learning materials.
3. Use single idea in each test item.
Faulty: The true-false item, which is favored by learners, is often referred to as an alternative-response
item.
Improved: The true-false item is also called an alternative-response item.
4. Refrain from using negatives, especially double negatives.
Faulty: There is nothing illegal about buying goods through the internet.
Good: It is legal to buy things or goods through the internet.
5. Avoid using absolutes such as “always” and “never.”
Faulty: The news and information posted on the CNN website is always accurate.
Good: The news and information posted on the CNN website is usually accurate.
6. Avoid the use of unfamiliar words or vocabulary.
Multiple Choice Items

The most versatile type of test since it can take several forms such as completion, questions and
direct form. Writing multiple choice items requires content mastery, writing skills, and time. Only good and
effective items should be included in the test. Poorly-written test-items could be confusing and frustrating
to learners. Each item in a multiple choice test consists of 2 parts: a) the stem, and b) the options. In the set
of options or alternatives, there is a “correct” or “best” option while the others are considered “distracters”.
The following are the general guidelines in writing good multiple choice items.
1. Do not lift and use statements from the textbooks or other learning materials.
2. Keep the vocabulary simple and understandable based on the level of learners/examinees.
3. A direct question is preferred over an incomplete sentence.
Faulty : Cement is ordinarily produced by _____________.
Good : How is cement ordinarily produced?
4. Word the stem positively and avoid double negatives, such as NOT and EXCEPT in a stem. If a negative
word is necessary, underline or capitalize the words for emphasis
Faulty: Which of the following is not a measure of variability?
Good: Which of the following is NOT a measure of variability?
5. Write the stem as clearly described question or task.
Faulty: Validity refers to
a. the consistency of test scores
b. the inference made on the basis of the test scores
c. measurement error as determined by standard deviation
d. the stability of test scores
Good: The inference made on the basis of the test scores refers to
a. Reliability
b. Validity
c. Stability
d. Measurement error
6. Avoid the use of unnecessary words or phrases which are not relevant to the problem at hand.
Faulty: While ironing his formal polo shirt Darwin burned his hand accidentally on the hot iron. This
was due to a transfer of heat because….
Good: Which of the following ways of heat transfer explains why Darwin’s hand was burned after he
touched a hot iron?
7. Write the distracters to be plausible yet clearly wrong.
Faulty : Which of the following is the largest city in the United States?
a. Michigan
b. London
c. New York
d. Berlin
Good : Which of the following is the largest city in the United States?
a. Los Angeles
b. Chicago
c. New York
d. Miami
8, Write options that are parallel or similar in form and length to avoid giving clues about the correct
answer.
9. Place options in logical order (e.g. alphabetical, shortest to longest)
10. Place correct response randomly to avoid a discernible pattern of correct answers.
11. Use none of the above carefully and only when there one absolutely correct answer.
12. Avoid all of the above option, especially if it is intended to be the correct answer.
Short Answer Test Items

A short answer test item requires the learner to answer a question or to finish an incomplete
statement by filling in the blank with the correct word or phrase.
The following are some guidelines in writing good fill in the blank or completion test items.
1. Omit just the appropriate word from the sentence
Faulty: Every atom has a central _____________called a nucleus.
Good: Every atom has a central core called a(n)______________.
2. Avoid having too many blanks in a statement.
Faulty: The __________ is the answer in _____.
Better: The product is the answer in _________.
3. Be sure that there is only one correct response.
Faulty: A four-sided polygon is called _______________.
Good: A quadrilateral with four equal sided is called ______________.
4. Avoid grammatical clues to the response.
Faulty: A group of islands surrounded by waters is called an _______________
Good: A group of islands surrounded by waters is called a(n) _______________
5. Put the blank at the end of a statement rather than at the beginning.
Faulty: __________________is support system that helps a learner accomplish tasks
Good: A support system that helps a learner accomplish tasks is called __________.
Matching Type Items
The matching test item format requires learners to match a word, sentence or phrase in one column
to a corresponding word, sentence or phrase in second. Column. It is most appropriate when you need to
measure the learner’s ability to identify the relationship or association between similar items. However, it
is not suited for gauging the learners’ higher understanding (analysis and synthesis levels). It can only be
used to assess homogeneous knowledge.
The following are some guidelines in writing good and effective matching type tests:
1. Include homogenous premises and responses in a single matching exercise
2. Clearly indicate in the directions the basis for matching where answers should be written, and if
responses or answer choices can be used more than once.
3. Keep the list relatively short. The ideal number of items is 5 to 10, and a maximum of 15.
4. Arrange premises and responses with maximum clarity. It is desirable to use longer statements as
premises and numbered at the left of the page. The shorter responses are placed at the right and each
identified with letters.
5. Have more responses or answer choices than premises. This will reduce guessing and using the process
of elimination in choosing the correct answer.
6. Place all the premises and responses on a single page
Faulty:
Directions: Match the following.
Food A. Primary reinforcer
Psychoanalysis B. Sigmund Freud
B.F. Skinner C. Operant conditioning
Standard deviation D. Measure of variability
Schizophrenia E. Hallucinations
Good:
Directions: Match the theories in Column I with their advocates in Column II. Write the letter of the correct
answer.
Column I Column II
___ 1. Psychodynamic Theory A. Albert Bandura
___ 2. Trait Theory B. B.F. Skinner
___ 3. Behaviorism C. Carl Rogers
___ 4. Humanism D. Gordon Allport
___ 5. Social Learning Theory . Karn Horney
F. Sigmund Freud
Essay Test
Essay test is the preferred method of evaluation when teachers want to measure learners’ higher
order thinking skills particularly their ability to reason, interpret, analyze, synthesize, and evaluate.
Types of Essay Items

Extended response type -requires much longer and complex responses
Restricted response type - the learners are free to organize and expound on their ideas.
Santos, et al (2007) and Balagtas et al 92019) present the following rules of thumb in constructing good
essay questions:
1. Clearly define the intended learning outcomes to be assessed by the essay test.
2. Refrain from using essay test for intended learning outcomes that are better assessed by other kind
of assessment.
3. Phrase the direction in such a way that students are guided on the key concepts to be included.
Example: Write an essay on the topic: “Plant Photosynthesis” using the following key words and
phrases: chlorophyll, sunlight, water, carbon dioxide, oxygen, by-product, stomata.
4. Note that the learners are properly guided in terms of the keywords that the teacher is looking for
in this essay test.
5. Inform the students on the rubrics to be used for grading their essays. This rule allows the learners
to focus on relevant and substantive materials rather than on peripheral and unnecessary facts and
bits of information.
6. Present tasks that are fair, reasonable and realistic to students
7. Be specific in the prompts about the time allotment.
Item Analysis
After drafting objective test items and administering it, how do you determine if the test items are
properly constructed as to degree of difficulty? How do you set apart students who excel well on the
overall test, and those who do not? An item analysis, as a valuable procedure, can easily provide the
teachers with answers to both questions.
Here are the basic concepts of item analysis:
Item analysis is a technique which evaluates the effectiveness of items in tests. It helps to improve
the test by revising or discarding ineffective items.
An item analysis provides three kinds of important information about the quality of test items.
Item difficulty: A measure of whether an item was too easy or too hard.
Item discrimination: A measure of whether an item discriminated between students who knew the material
well and students who did not.
Effectiveness of alternatives: Determination of whether distractors (incorrect but plausible answers)

appear to be identified by the less able students and not by the more able students.
How to Determine if an Item Is Easy or Difficult

An item is difficult if majority of learners are unable to provide the correct answer. The item is easy
if majority of the learners are able to answer correctly. An item can discriminate if the examinees who
score high in the test can answer more items correctly than examinees who got low scores (Balagtas, et al.,
2015).
Below is a data set of five items on the addition and subtraction of integers. Follow the procedure to
determine the difficulty and discrimination index of each item.
1. Get the scores of each learner and arrange scores from highest to lowest.
Item 1 Item 2 Item 3 Item 4 Item 5
Student A 0 0 1 1 1
Student B 1 1 1 0 1
Student C 0 0 0 1 1
Student D 0 0 0 0 1
Student E 0 1 1 1 1
Student F 1 0 1 1 0
Student G 0 0 1 1 0
Student H 0 1 1 0 0
Student I 1 0 1 1 1
Student H 1 0 1 1 0
Obtain the upper and lower 27% of the group. Multiply 0.27 by the total number of students, a value of
2.7. the rounded whole number value is 3.0. Get the top 3 students and the bottom 3 students based on
their total scores. The top 3 students are students 2, 5,9 while the bottom 3 are students 7, 8 and 4. the rest
of the students are not included in the item analysis.
Item 1 Item 2 Item 3 Item 4 Item 5 Total score
Student 2 1 1 1 0 1 4
Student 5 0 1 1 1 1 4
Student 9 1 0 1 1 1 4
Student 1 0 0 1 1 1 3
Student 6 1 0 1 1 0 3
Student 10 1 0 1 1 0 3
Student 3 0 0 0 1 1 2
Student 7 0 0 1 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1
2. Obtain the proportion correct for each item. This is computed for the upper 27% group and the lower
27% group. This is done by summating the correct answer per item and dividing it by the total number of
students.
Item 1 Item 2 Item 3 Item 4 Item 5 Total score

Student 2 1 1 1 0 1 4
Student 5 0 1 1 1 1 4
Student 9 1 0 1 1 1 4
Total 2 2 3 2 3
Proportion of the high group (PH)
0.67 0.67 1.00 0.67 1
Student 7 0 0 1 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1
Total 0 1 2 1 1
Proportion of the low group (PL)
0.00 0.33 0.67 0.33 9.33
3. The item difficulty is obtained using the formula:

Item difficulty = pH + pL
2
The difficulty is interpreted using the table:
Difficulty Index Remark
0.76 or higher Easy item
0.25 to 0.75 Average item
0.24 or lower Difficult item
Computation
= 0.67 + 0 = 0.67+0.33 = 2.0 + 0.67 = 1.00 + 0.33 = 1.00 + 0.33
2 2 2 2 2
Index of 0.33 0.50 0.83 0.50 0.67
difficulty
Item Difficult Average Easy Average Average
difficulty
4. The index of discrimination is obtained using the formula:
Item discrimination = pH - pL
The value is interpreted using the table:
Index discrimination Remark
0.40 and above Very good item
0.30 - 0.39 Good item
0.20 - 0.29 Reasonably good item
0.10 - 0.19 Marginal item
Below 0.10 Poor item

= 0.67 - 0 = 0.67 - 0.33 = 2.00 - 0.67 = 1.00 - 0.33 = 1.00 - 0.33
Discrimination 0.67 0.33 0.33 0.33 0.67
index
Discrimination Very good Good item Good item Good item Very good
item item
UNIT IV – DESIGNING ANALYSIS AND INTERPRETATION OF ASSESSMENT
RESULTS
Statistics plays a vital role in the complexities of life. It aids in decision making, summarizes or
describes data, helps to forecast or predict future outcomes, aids in making inferences, and helps in
comparisons or establishing relationships. In education, statistics give information about the school's
population change (statistics in enrolment and dropout rate), assist in processing specific evaluations, and
surveys weregiven to improve the school system and evaluate the achievements, grades, and in
preparations of the test (proficiency level).
EXPANDING YOUR KNOWLEDGE

Statistics is the process of collecting, organizing, summarizing, presenting, analyzing, and
interpreting data to create a valid conclusion and rational decisions.
Stages of Statistical Enquiry

a. Collection of Data – the process of data gathering, such as interview, questionnaires, tests,
observations, registrations, and experiments.
b. Presentation of Data – organizing of data through tabular, graphical, or textual presentation.
c. Analysis of Data – the process of extracting from the given data relevant and noteworthy information
using statistical techniques and methods
d. Interpretation of Data – drawing of conclusions or inferences from the analyzed data.
There are two divisions in statistics, which are descriptive statistics and inferential statistics.
Descriptive Statistics is a statistical procedure concerned with describing the characteristics and
properties of a group of persons, places or things that based on confirmable facts. It organizes the
description, presentation and interpretation of data gathered.
Inferential Statistics is a statistical procedure used to draw inferences from the population by obtaining
information from the sample by using techniques of descriptive statistics.
Classification of Variables
Qualitative vs. Quantitative

Qualitative variable – contains categorical or qualitative responses. It refers to the characteristics or
attributes of the sample such as civil status, religious affiliations, gender
Quantitative variable – contains numerical responses representing an amount or quantity such as height,
weight, number of children
a. Discrete – values obtained by counting, e.g., births, students in the class
b. Continuous – values obtained by measurement, e.g., age, height
Dependent – a variable which is affected by another variable, e.g., test scores

Independent – a variable which affects the other variable
e.g., number of hours spent for studying
Levels of Measurements of Variables

Normally, when you hear the term measurement, you may think terms like in measuring length (ie.
the length of a book) or measuring a quantity (ie. a cup of sugar). In statistics, the term measurement deals
with the scales of measurement. Scales of measurement refer to the variables/numbers are being defined
and categorized. Each scale of measurement has properties that determines the suitability for use of a
particular statistical analyses. The data can be categorized into nominal, ordinal, interval and ratio.
Nominal: data are categorical and the numbers are used as identifiers or a representation. The
numbers on the back of a jersey (COED Blazer 1 = Juan dela Cruz) and the social security number are some
examples of a nominal data. If you conduct a survey and you will include gender as a variable, code the
Female as 1 and Male as 2 or vice versa when you enter your data into the computer. Thus, using numbers
1 and 2 can be used to represent the categories of data.
Ordinal: it denotes an ordered series of associations or rank order. In a contest, an individuals are
competing to achieve first, second, or third place. The first, second, and third place represents ordinal data.
If Rose takes first and Willy takes second, we do not know if the competition was close; we only know that
Rose outperformed Willy. Likert-type scales also represent ordinal data. Basically, these scales do not
represent a measurable quantity. An individual may respond 8 to a question and he actually feel less than
someone who responded 5. Another person may not be in half as much pain if he responded 4 than if he
responded 8. This data may only indicate that an individual responded 6 is in less pain than a person
responded 8 and in more pain than a person responded 4. Therefore, Likert-type scales represent a ranking.
Interval:it represents a quantity and has equal units in which zero indicates an additional point of
measurement is an interval scale. For Example 10 degree Fahrenheit or -10 degrees Fahrenheit are an
interval data. Each of these scales are a direct measures of a quantity with equality of units. Thus, zero does
not represent the absolute lowest value. Rather, it is the point on a scale with numbers both above and
below it.
Ratio: it is a scale of measurement which is similar to the interval scale that represents quantity
and has equality of units. However, ratio has an absolute zero (no numbers exist below zero). It is
commonly used in physical measures like height and weight. If one is measuring a height of a person in
centimeters, there is quantity, equal units, and that measure cannot go below zero centimeters. A negative
height is not possible.
The table below shows a summary of fundamental differences between the four scales of
measurement
DATA COLLECTION
Data collection is gathering information from some person or some other ways to get data. Data collection
is done to keep on record for further use, to make essential decisions about different problems, and to
disseminate information on to others.
Primary Data - the collection of data from the first-hand source. This type of data is mostly pure and
original.
Secondary Data –the collection of data from the second-hand source. Information could be from another
researcher or agency.
DATA-GATHERING TECHNIQUES
Method Characteristics Advantages Disadvantages
Direct or
Researcher has direct contact to Clarification can do Costly and time-
interview
the respondents easily consuming
method
Researcher gives or distributes
Indirect or questionnaire to the respondents Saves time and money;
questionnaire either by personal delivery or by A large number of Problem of retrieval
method mail samples can reach
Information is based on the

Data are limited to what
Registration compliance with specific laws, Most reliable since law
is registered in the
Method policies, rules, regulations, or enforces it
documents
standard practices.
The researcher wants to control
Experimental the factors affecting the variable Can go beyond plain Lots of threats to internal
Method being studied to find out cause description or external validity
and effect relationships
Data can be quickly
Utilized to gather data regarding
gathered with the Information may be
Observation attitudes, behavior or values and
available time of the subjected to subjective
method cultural pattern of the samples
researcher since it can judgments
under investigation
be done anytime.
DATA PRESENTATION
The collected data can be presented in 3 different ways which include:

1. Textual
2. Tabular
3. Graphical
TEXTUAL PRESENTATION
Data presented in a paragraph or in sentences are said to be in textual form. This includes an
enumeration of essential characteristics, emphasizing the most significant features, and highlighting the
most striking attributes of the set of data.
Example:
According to a rapid survey conducted by the government, 77 percent of micro and small firms and
62 percent of medium-sized firms had to close due to the enhanced community quarantines. Those that
remained open suffered a 66.5 percent drop in sales.
The growth forecast for 2020 assumes that the containment measures will gradually ease in the
second half of the year, and economic activities return in some sectors of the economy. Given income losses
and heightened uncertainty, household consumption and private investment are expected to remain weak.
However, economic growth prospects and poverty figures are expected to improve in succeeding
years driven by a rebound in consumption, a stronger push in public investment, supportive fiscal and
monetary policies, and the recovery of global growth. Economic growth is projected to return to above 6
percent in 2021 and 7 percent in 2022. Increased economic activity surrounding national elections will also
boost growth in 2022.
(Philippines: Social Assistance to Poor Households, Support for Small Enterprises Key to Broad-Based
Recovery; http://worldbank.org; June 9, 2020)
TABULAR PRESENTATION
The tabular method makes use of rows and columns. The data are presented in a systematic and
orderly manner, which catches one's attention and may facilitate the comprehension and analysis of the
data presented.
Frequency Distribution Table

The frequency distribution table (FDT) is a statistical table that shows frequency of observations
for each of the defined classes or categories.
Parts of Statistical Table

1. Table Heading – contains table number and title of the table
2. Body – it is the main part of the table that covers the information or figures
3. Stubs or classes –it is theclassification or categories describing the data and usually found at the
left most side of the table.
Boxhead – located in the top of the body which includes the stubhead, the master caption and the column
caption.
Types of Frequency Distribution Table
1. Qualitative or Categorical FDT – A frequency distribution table where the data are grouped
according to some qualitative characteristics; data are grouped into non-numerical categories.
Table 2
Frequency Distribution of Gender of the Respondents
Gender Number of Respondents
Male 77
Female 45
Total 122
2. Quantitative FDT – a frequency distribution table where the data are grouped according to some
numerical or quantitative characteristics.
Table 3
Ungrouped Frequency Distribution for the
Weights of 50 Students in Prof Ed 6 Class
WEIGHT (in kg) FREQUENCY Ta
49 2 ble
50 3 4
51 5 Gro
52 7 upe
53 7 d
54 0 Fre
55 0 que
56 0 ncy
57 0 Dis
58 12 tri
59 0 but
60 7 ion
61 0 for
62 4 the
63 2 We
64 1 igh
Total 50 ts
of
50 Students in Prof Ed 6 Class
WEIGHT FREQUENCY
(in kg)
48 – 49 2
50 – 51 8
52 – 53 17
54 – 55 0
56 – 57 0
58 – 59 12
60 – 61 7
62 – 63 6
64 – 65 1
Total 50
Steps in Constructing Grouped FDT
1. Determine the range
Consider the following raw data on the first quiz in Prof Ed 6

37 24 37 41 38 28 35 32 41 31
51 48 33 29 34 46 39 33 32 39
41 49 28 29 45 27 43 49 39 27
43 54 39 49 57 22 38 32 49 50
44 45 33 42 39 40 48 35 43 47
R = 57 – 22 = 35
2. Determine the number of classes (class intervals)

Note: There's no definite rule in determining the number of class intervals for as long as the
number can provide the necessary information needed. However, the ideal number of class
intervals is between 5 and 20 depending on the nature of data. What is important is that the
class agrees on a standard method to use for uniformity and consistency.
Remarks: There are other alternatives to determining the number of intervals.
�= �
where n is the number of observations and k is the number of intervals.
Example: n = 100, then k = 100 = 10
n = 72, then k = 72 = 8.49 ≈ 9
n = 50, then k = 50 = 7.07 ≈ 8
3. Determine the class size (ἱ), also known as class width

�=� �
where R is the range, and k is the number of the interval.
R = 35; n = 8, then ἱ = 35/8 = 4.375 ≈ 5
4. List the limits of each class interval. Preferably, lower limit of the lowest class interval
is a multiple of the class size of the class interval
Example: 20 – 24
20 is the lower limit, and 24 is the upper limit
Table 5
Frequency Distribution Table of the scores
in the First Quiz in Prof Ed 6
Class Intervals Frequency
20 – 24 2
25 – 29 6
30 – 34 8
35 – 39 11
40 – 44 10
45 – 49 9
50 – 54 3
55 - 59 1
N 50
A simple groupedfrequencydistribution table consists only of class interval and

frequency. Table 4 and 5 are simple grouped FDT tables.
A complete grouped frequency distribution table has a class mark or midpoint (x), class
boundaries (c.b), relative frequency (rf), cumulative frequencies, (cf) and relative
cumulative frequency.
Class mark – the midpoint of the class interval getting the average of the upper and lower
limits
�� + ��
�=
2
Example: class mark of class interval 20 – 24

20+24
� = 2 = 22
Class boundaries – these are the true limits of class intervals. Each class boundary equals
the number midway between the upper limit and the lower limit of the succeeding class
interval.
Example: the class boundaries of 20 – 24 is 19.5 – 24.5
Relative Frequency – also called percentage frequency. It is the proportion of observations
falling in a class and is expressed in percentage. It is obtained by dividing the frequency of
each class by N.
�
�� = �100%
�
Example: If the frequency of class interval of 20 – 24 is 1 and N = 50,

50
�� = 1 �100% = 2%
Cumulative Frequency (cf)– accumulated frequency of the classes
a. Less than cf (<cf) – total number of observations whose values do not exceed the
upper limit of the class.
b. Greater than cf (>cf) – total number of observations whose values are not less than
the lower limit of the class.
Relative Cumulative Frequency

a. Less than RCF (<RCF)
b. Greater than RCF (>RCF)
Table 6
Complete Grouped Frequency Distribution Table of the scores
in the First Quiz in Prof Ed 6
C.I. F x c.b. rf <cf >cf <rcf >rcf
20 – 24 2 22 19.5 – 24.5 4% 2 50 2 100
25 – 29 6 27 24.5 – 29.5 12% 8 48 16 98
30 – 34 8
35 – 39 11
40 – 44 10
45 – 49 9
50 – 54 3
55 - 59 1
N = 50
The contingency table

This is the table which shows the responses of subjects to one variable as a function
of another variable. One type of this kind of table is the row by column where the columns
refer to the samples and the rows refer to the choices or alternatives.
Table 7
The Contingency Table for the opinion of viewers on the New TV Program
Samples
Choices Total
Men Women Children
Like the Program 59 67 32 158
Indifferent 21 32 12 65
Do not like the Program 46 12 78 136
Total 126 111 122 359
Table 7 is a 3 x 3 table since it has 3 columns and 3 rows. The samples enumerated
in columns are men, women, and children while the choices or alternatives enumerated in
rows are: like the program, indifferent and do not like the program. Column and row totals
are not included in the count.
GRAPHICAL PRESENTATION OF DATA
The numerical data provided in a frequency distribution table or contingency table

can be made exciting and easier to understand when depicted in GRAPHICAL FORM. A
graph is a pictorial representation of a given data.
Common Types of Graph

1. Scatter Graph – a graph used to present measurements or values that are thoughts to
be related.
2. Line Chart – a graphical presentation of data especially useful for showing trends over
a period of time.
3. Pie Chart – it is a circular graph that is useful in showing how a total quantity is
distributed among a grouped of categories. Each pieces of pie represent an amount on
the total portion of the category.
4. Column and Bar Graph – like pie charts, column charts and bar charts are applicable
only to grouped data. It is used for DISCRETE grouped data of ordinal or nominal scale.
Other Type of Graphs
1. Frequency Histogram – a bar graph that presents the classes on horizontal axis
and thefrequencies of the classes where on the vertical axis. The vertical lines of the
bars are on the class boundaries, and the height of the bar corresponds to the class
frequency.
2. Frequency Polygon – a line graph that is constructed by plotting the frequencies at
the class marks connecting the plotted points by means of straight lines and
encloses the polygon by adding an additional class at each end, for which the ends of
the line are connected to the midpoints of the additional classes at the horizontal
axis.
3. Relative Frequency Histogram – it displays graph in which horizontal axis
represetns the classes and the vertical axis represents the relative frequencies.
4. Ogives – forms a graph of the cumulative frequency (cf) distribution
a. <ogive – the less than cf is plotted against the Upper true class boundary
b. >ogive – the greater than cf is plotted against the Lower true class boundary
MEASURES OF CENTRAL TENDENCY

(Descriptive Statistics)
Any single value that describe the "center" of the given data. It is often known as the
average.
Numerical descriptive measures which indicate or locate the center of a distribution of a set
of data.
SUMMATION NOTATION
Suppose that a variable X is a variable of interest and that � measurements are

taken. The notation X1, X2, …., Xnwill be used to represent the � observation.
The Greek letter “Σ” indicates the "summation of…" and you can write the sum of the
observations as
�
�� = �1 + �2 + … + ��
�=1
The number 1 and n are called the lower and upper limits of summation, respectively.
Example: Write out the following in full, that is, without summation signs:
5
1. ) �� = �1 + �2 + �3 + �4 + �5
�= 1
4
2. ) �� = �1 �1 + �2 �2 + �3 �3 + �4 �4
�= 1
Rules on Summation
1) The summation notation is distributive over addition.
� � �
�� + �� = �� + ��
�= 1 �= 1 �= 1
2) If c is a constant, then
� �
�� = � ��
�= 1 �= 1
�
3) If c is a constant, then �= 1
� = ��
A. Expand of the given expression using the rule of summation.

4 4 4
2�� + 3 = 2�� + 3 �� 1
�= 1 �= 1 �= 1
4
= 2�� + 4 3 �� 3
�= 1
4
= 2 �� + 4 3 �� 2
�= 1
= 2 �1 + �2 + �3 + �4 + 12
B. Write the following into summation notation with appropriate limits.

8
1. �5 + �6 + �7 + �8 = ��
�= 5
5
2. �3 − 2 + �4 − 2 + �5 − 2) = (�� − 2)
�= 3
MEASURES OF CENTRAL TENDENCY

(Descriptive Statistics)
Any single value that describe the "center" of the given data. It is often known as the
average.
Numerical descriptive measures which indicate or locate the center of a distribution of a set
of data.
1) Mean = TOTAL of items ÷ NUMBER of items

2) Median = MIDDLE value
3) Mode = MOST common value
PROPERTIES OF THE MEAN, MEDIAN, AND MODE
MEAN MEDIAN MODE

1. The sum of the deviations of The score or class in a It is used when we want to fid
all measurements in a set from distribution below which 50% the value which occurs most
the mean is 0. of the score fall and above often.
which 50% lie.
2. It can be calculated for any 2. Not affected by extreme or It is a quack approximation of
set of numerical data so it deviant values the average.
always exist.
3. A set of numerical data has 3.Appropriate to use when the It is an inspection average.
one and only one mean are outliers or extreme or
deviant values.
4. It lends itself to a higher Use when the data are ordinal. It is the most unreliable
statistical treatment among the three measures of
central tendency because its
value is undefined in some
observations.
5. It is the most reliable since it It exists in both quantitative It exists in both quantitative
takes into account every item and qualitative data. and qualitative data.
in a set of data
6. It is greatly affected by
outlier/extreme or deviant
value
7. It is used only if the data are
interval or ratio, and when
normally distributed.
MEASURES OF LOCATION
The percentile is a measure which divides the distribution into one hundred equal
parts. The quartile measure divides the distribution into four equal parts while the decile
divides the distribution into ten equal parts.
THE STANDARD DEVIATION AND THE VARIANCE
Standard deviation is simply a measure of how far from the mean the data is spread.
It can be visualized like a dartboard, where the center is the MEAN and the darts are the
data in the set.
The greater the standard the deviation is, the greater the spread. The greater the spread, the
more inconsistent it gets.
The remaining Descriptive Statistics in the Excel computation – Kurtosis and Skewness,
refers to the overall shape of the distribution relative to the normal distribution (the bell
curve). Kurtosis is the degree of peakness relative to a normal distribution while Skewness
is the degree of asymmetry (departure from symmetry).
STANDARD SCORE
The standard score is the measures of standard deviations in relation to the mean. It
is computed as
�− �
Z= �
Where Z = standard score

� = raw score
� = population mean
� = population standard deviation
Dante performed better in English.
A normal probability is a distribution that is continuous in which both symmetrical

and mesokurtic. The curve representing the normal probability distribution is often
described as being "bell-shaped". This is sometimes called "Gaussian Distribution" or the
"Normal Curve".
Properties of Normal Curve
1. The mean = median = mode.

2. It is symmetrical about the mean.
3. The tails or ends are asymptotic relative to the horizontal axis.
4. The total area under a normal curve is 1.0 or 100%.
5. The normal curve area maybe subdivided into standard deviations, at least 3 to the
left and 3 to the right.
EMPIRICAL RULE (OR 68–95–99.7 RULE)
For data with a normal distribution, the standard deviation has the following
characteristics.
1. About 68% of the data are within one standard deviation of the mean.
2. About 95% of the data are within two standard deviations of the mean.
3. About 99.7% of the data are within three standard deviations of the mean.

Unit Iii - Designing and Developing Assessments: Let's Read These

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit Iii - Designing and Developing Assessments: Let's Read These

Uploaded by

Copyright:

Available Formats

UNIT III – DESIGNING AND DEVELOPING ASSESSMENTS

Let’s read these:

Features of a Properly Accomplished Test:

Content Validity The extent an evaluation procedure adequately represents

Predictive Validity A measure should predict a future criterion

Factors which affect the validity of test scores (Ramadan, 2018):

B. Factors in test administration and scoring:

C. Factors related to students:

You have a test, and you need to administer it

Learning Target and Assessment Method Match

Types of Learning Targets

Appropriate Methods of Assessment

Note: Higher numbers indicate better matches (e.g. 5 = excellent, 1 =poor)

Other support Materials Available:

Preparing a Table of Specifications

Steps in Developing a Table of Specifications

Simply remember this equation:

Selection and Organization of Content 6 22.2% 18 1-18

Selection and Use of Teaching

Topic No. of % No. of R U Ap An E C

Assessment Tools Development

2. Constructed-response type - it requires learners to supply answers to a given question or problem.

FIGURE 1. Types of Objective tests

General Guidelines in Choosing Appropriate Test Format

Multiple Choice Items

Short Answer Test Items

Matching Type Items

Types of Essay Items

Effectiveness of alternatives: Determination of whether distractors (incorrect but plausible answers)

How to Determine if an Item Is Easy or Difficult

Item 1 Item 2 Item 3 Item 4 Item 5 Total score

3. The item difficulty is obtained using the formula:

Item 1 Item 2 Item 3 Item 4 Item 5

EXPANDING YOUR KNOWLEDGE

Stages of Statistical Enquiry

Qualitative vs. Quantitative

Dependent – a variable which is affected by another variable, e.g., test scores

Levels of Measurements of Variables

Information is based on the

The collected data can be presented in 3 different ways which include:

Frequency Distribution Table

Parts of Statistical Table

Types of Frequency Distribution Table

Consider the following raw data on the first quiz in Prof Ed 6

2. Determine the number of classes (class intervals)

3. Determine the class size (ἱ), also known as class width

A simple groupedfrequencydistribution table consists only of class interval and

Example: class mark of class interval 20 – 24

Example: If the frequency of class interval of 20 – 24 is 1 and N = 50,

Relative Cumulative Frequency

The contingency table

GRAPHICAL PRESENTATION OF DATA

The numerical data provided in a frequency distribution table or contingency table

Common Types of Graph

Other Type of Graphs

MEASURES OF CENTRAL TENDENCY

Suppose that a variable X is a variable of interest and that � measurements are

A. Expand of the given expression using the rule of summation.

B. Write the following into summation notation with appropriate limits.

MEASURES OF CENTRAL TENDENCY

1) Mean = TOTAL of items ÷ NUMBER of items

3) Mode = MOST common value