You are on page 1of 25

ASSESSMENT IN LEARNING

GROUP 3: Development of Tools for Classroom Assessment

CHAPTER 7: OVERALL TEST DEVELOPMENT


PROCESS

a. Planning Phase – the purpose of the test is identified, the learning outcomes to
be assessed are specified and a table of specifications is prepared for guidance in
the item construction phase.
b. Item Construction Phase – test items are constructed following the
appropriate format for the specified learning outcomes of instruction
c. Review Phase – the items are examined by the teacher or peers, prior to
administration based on the judgment of their alignment to content and behavior
components of the instructional competencies. And after the administration,
based on an analysis of students’ performance in each item.
A.) Identifying Purpose of Test

 Seeks to uncover what students know and can do to get feedback on what they need
to alter or work on further to improve their learning.
 Teachers can use the results to map out their pedagogical moves to improve
teaching-learning for appropriate mentoring.
 Alternatives or distracters in selected-response items can be in terms of popular
falsehood, misconceptions, misinterpretations or inadequately stated principles
students may likely adapt. (ex: multiple choice items)
 The teacher carefully selects the relevant content or skills to be covered in a test.
 Given at the end of the instructional process, a test takes a different purpose.

B,) Specifying the Learning Outcomes

 Assessment has changed considerably as advances occur in the field of educational


and cognitive psychology particularly in defining learning and its domains.
 Focused on defining learning standards in terms of outcomes that will spell out
what learners should know and be able to do along established hierarchical levels
of cognition.
 Processes for assessment recognize and address different learning targets defined
by the intended outcomes from knowledge of facts and information covered by the
curriculum at every level to various facets of showing understanding of them.
 Classroom tests – carefully planned to ensure that the students
truthfully and reliably quantified what are intended to be measured
(Can either be a pre-instructional assessment or post-instructional
assessment).
 Summative tests – focus on the accomplishment of the learning
outcomes demarcated in every unit of work designed in the
curriculum.
C.) Preparing a Test Blueprint

 Test Blueprint is a list of key components defining your test, including the
purpose of the test and the content framework.

 Regardless of what subject area, a classroom test cover the learning outcomes
intended and essential to be achieved within the unit or period of work.

 This planning phase help teachers make genuine connection in the trilogy among
Curriculum, Instruction, and Assessment, known as the ALIGNMENT
TRIANGLE.

 Curriculum is the written set of


educational outcomes driving
instruction and assessment.
 Instruction is what actually happens
at the classroom; can provide feedback
in improving curriculum.
 Assessment is the formal process of
gathering, analyzing, and reporting
standardized information about the learning;
can identify weaknesses in instruction that can
be corrected and detect holes in curriculum
that can be filled.

TABLE OF SPECIFICATIONS (TOS)


-is a two-way chart which describes the topics to be covered by a test and the number of
items or points which will be associated with each topic.
-“WHAT will be tested and HOW it will be tested.”
WHAT: HOW:
-Content area being covered -test format, (the type of assessment
-target learning outcomes questions or tasks, and the distribution
of items for each topic

Key Points:
 Test Blueprint or Table of Specification gives evidence to the content validity.
 Common TOS can show the breakdown of learning outcomes by content area
and cognitive process or behavior involved.

D.) Other forms of Specification

Miller, Linn, and ground (2009) developed a table of specification that breaks
down learning outcome that covers a wider cognitive domain. This table of
specification is useful to detect the areas the learner have gained
mastery in, and those with need additional instructional attention.
CHAPTER 8: SELECTING AND CONSTRUCTING TEST
ITEMS AND TASKS

Content Objectives
• Understand how to prepare and use an achievement test Understand
when to use an achievement test
Selecting and Constructing Test Items and Task
• This chapter deals with familiarized you with the initial process in
developing classroom test.
• Test types that are appropriate to gauge the learning outcomes
proposed by the curriculum standards, how to select them and how to
construct them.

A.) Categorizing Test Types

 Various Types of Formal and Informal test


 Group the various test types

Test Types has 3 major components: (a). Supply Type, (b).Selection Type, (c).
Performance Type

Supply Test Type

 Supply type test items are those which can be answered by a word, phrase, number
or symbol.
 Supply type test items can be further divided into short answer test items and
completion test items.

Selection Type

 Selection items (or selected response items) are test items on which the examinee
selects one of a set of choices, rather than generating an original response.
 A conventional, multiple-choice item typically features a stimulus, an item stem or
question, a correct answer choice, and a set of incorrect choices.
 Performance Type
Performance Type
 Performance testing is type of testing perform to determine the performance of
system to major the measure, validate or verify quality attributes of the system
like responsiveness, Speed, Scalability, Stability under variety of load
conditions.
 Load testing is one of the simplest forms of performance testing and is usually
conducted to understand the behaviour of the system under a specific expected
load.

B. Relating Test Types with Levels of Learning Outcomes

1. Measuring knowledge and Simple Understanding


- Knowledge as it appears in cognitive taxonomies (Bloom, 1956:
Anderson and Krathwol, 2004) as the simplest and lowest level, is
categorized further into what thinking process is involved in learning.

- Knowledge involves remembering specific facts, symbols, details,


elements of events, and principles to acquire new knowledge.

- Measuring knowledge and Simple Understanding can make up the


early phase of understanding requiring lower-order thinking such as
remembering, comprehending, and applying. McMillan (2007) refers
to the latter two as simple understanding requiring comprehension of
concepts, ideas, and generalizations, known as declarative knowledge
and application of skills and procedures learned in new situations,
referred to as procedural knowledge.

Levels of Declarative and Procedural Knowledge

Level Declarative Procedural


Knowledge Remembers, defines, Remembers, defines,
identifies, recognizes, identifies, recognizes,
names, reproduces, or names, reproduces, or
selects specific facts, selects correct procedure,
concepts, principles, rules concepts, principles, rules
or theories. or theories
Simple Understanding: Converts, translates, Converts, translates,
Comprehension distinguishes, explains, distinguishes, explains,
provides examples, provides examples,
summarizes, interprets, summarizes, interprets,
infers or predicts in own infers or predicts in own
words, essential meanings words, correct procedure,
of concepts and principles. steps, skills, and strategies.
Simple Understanding: Uses existing knowledge of Uses existing knowledge of
Application concepts, principles, and correct procedures, steps,
theories, in new situations, skills or strategies, in new
to solve problems, situations, to solve
interpret information and problems, interpret
construct responses. information and construct
responses.

Differentiate the Declarative and Procedural knowledge

Declarative Knowledge Procedural Knowledge


Is able to state the Law of supply and Is able to state the Law of supply and
demand. demand.
Comprehension Comprehension
Is able to explain the law of supply and Is able to compare the size of two given
demand. lots in terms of area.

Application Application
Is able to explain the rising prices of Is able to determine the number of 1 x 1
vegetables during summer time. tiles needed to cover a 50 ft x 100 ft half.

Categories of Lower-order Thinking Skills and Sample Generic Questions

Low level Thinking Skills Examples of Generic Questions


Knowledge of terminologies What is a ____?
Knowledge of specific facts When did __________ happen?
Knowledge of conventions Where are _____ usually found?
Knowledge of trends and sequences Name the stages in ________?
Knowledge of classification and Which ____ does not belong with the
categories others?
Knowledge of criteria By what criteria will you use to judge
_____?
Knowledge of methods, principles, When ____ increases, what happen to
techniques _______?
Comprehension What do you mean by the expression
______?
Simple interpretations What makes _____ interesting?
Solving numerical problems Use the data above to find the _____
Manipulating symbols equations Show that _______ equals _______

2. Measuring Deep Understanding.


- Beyond knowledge and simple understanding level comes deep
understanding which requires more complex thinking processes.
- McMillan utilizes a Knowledge/Understanding to illustrate the relative
degree of understanding from knowledge to simple understanding to
deep understanding.
Alignment of Learning Outcomes and Cognitive Levels
Knowledge Simple Understanding Deep Understanding

Cognitive Levels/Levels of Learning Outcome

Level 1: Remembering Level 2: Comprehending Level 4: Analyzing


-Recall -Interpret -Organize
-Recognize -Exemplify -Distinguish
-Name -Classify -Outline
-Describe -Compare -Transform
-Explain -Diagnose
-Infer -Deconstruct

Level 5: Evaluating
Level 3: Applying -Critique
-Solve -Assess
-Apply -Defend
-Modify -Justify
-Demonstrate -Appraise
-Employ -Measure
-Calculate
-Generate Level 6: Creating
-Plan
-Generate
-Produce
-Design
-Construct
-Compose
Alignment of Learning Outcome to Test Types
Knowledge Simple Understanding Deep Understanding

Cognitive Levels/Levels of Learning Outcome


Level 1: Remembering Level 2: Comprehending Level 4: Analyzing
-Recall -Interpret -Organize
-Recognize -Exemplify -Distinguish
-Name -Classify -Outline
-Describe -Compare -Transform
-Explain -Diagnose
-Infer -Deconstruct

Level 5: Evaluating
Level 3: Applying -Critique
-Solve -Assess
-Apply -Defend
-Modify -Justify
-Demonstrate -Appraise
-Employ -Measure
-Calculate
-Generate Level 6: Creating
-Plan
-Generate
-Produce
-Design
-Construct
-Compose

Supply Type Supply Type Supply Type


-Completion -Completion -Essay-Restrictive
-Short Answer -Short Answer -Essay-Extended

Selection Type Selection Type Selection Type


-Binary Choice -Binary Choice -Multiple Choice
-Multiple Choice -Multiple Choice -Interpretive Items
-Matching Type
Performance Task
-Written
-Work Sample
-Simulation
-Project
C. Constructing Objective Supply Type of Items

1. Completion Type- an item structure which defines the question or problem,


and a response which defines what is to be provided or constructed by the
learner.

EXAMPLE QUESTIONS:
A. A part of speech that is used to connect clauses or sentences or to
coordinate words in the same clause is called _________.
-The expected response is CONJUNCTION.
B. The novel, El Filibusterismo was written by Dr. Jose Rizal during
_________ colonial period.
-The expected response is SPANISH.

*Gap Filling is another term for this variant as the students fills several gaps in a
discourse depending on the target outcome.
- Guidelines in the construction of Completion Items (Kubiszyn and Borich,
2010; McMillan, 2007; Nitko, 2001; Popham, 2011)

• A. There should be one correct response to complete a statement.


• B. The blank should be placed at the end or towards the end of the
incomplete
statement.
• C. Avoid providing unintended clues to the correct answer.

2. Short Answer Items- are constructed as direct answers to questions.


EXAMPLE QUESTIONS:
A. What part of speech that is used to connect clauses or sentences or to
coordinate words in the same clause?
B. During what colonial period was El Filibusterismo written by Dr. Jose
Rizal?

*Short-answer items similarly follow the guidelines in writing completion items.


Guidelines by McMillan (2007, pp.170-171)
1. State the item so that only one answer is correct.
2. State the item so that the required answer is brief. Requiring a long response would
not be necessary and it can limit the number of items students can answer within
the allotted period of time.
3. Do not use questions verbatim from textbooks and other instructional materials.
This will give undue disadvantage to students not familiar with the materials since
it can become a memory test instead of comprehension.
4. Designate units required for the answer. This frequently occurs when the
constructed response requires a definite unit to be considered correct. Without
designating the unit, a response may be rendered wrong because of differing mind-
set.
5. State the item succinctly with words student understand. This is true for all types
of tests. Validity of classroom-based test is at risk when students cannot answer
correctly, not because they do not know, but could due to the messy wording of the
questions.

D. Constructing Objective Supply Test

ESSAY

What is an Essay?
Essay (n.) | \ ‘e-,sā
analytic or interpretative literary composition usually dealing with its subject
from a limited or personal point of view
CHARACTERISTICS OF AN ESSAY
- Belongs to supply category because students are required to have a fully
constructed response on their own
- Free-structured: students can organize their response freely and can even
manifest their own creative writing style
- Good for testing deep understanding and reasoning

THINKING PROCESSES INVOLVED [to satisfactorily answer Essay Questions]


- Comparison
- Induction
- Deduction
- Abstracting HIGHER ORDER
- Analyzing perspectives THINKING SKILLS
- Decision-making
- Problem solving
- Constructing support & experimental inquiry

VARIATIONS OF ESSAY ITEMS


A. Restricted-response
B. Extended-response

Restricted-response
- Suggests specification
- Limited coverage of content
- Specified length of response
- Expected form of response
- Definite perspective/mindset to be used
Extended-response
- Does not suggest any form of restriction
- Free to organize & expound ideas
SUGGESTIONS FOR CONSTRUCTING ESSAY QUESTIONS
- by Miller, Linn & Gronlund (2009, p. 243)
1. Restrict the use of essay questions to those learning outcomes that cannot be
measured satisfactorily by objective items
2. Construct questions that will call forth the skills specified in the learning
standards
3. Phrase the question so that the student’s task is clearly defined
4. Indicate approximate time limit for each question
5. Avoid the use of optional questions

RUBRICS/ANALYTIC SCORING
- Helps the teacher to score and can significantly reduce subjectivity and can more
or less help in objectifying scoring of a non-objective type of item
Example:
Criterion Exemplary Very Satisfactory Moderately Fairly Needs
(6 pts) Satisfactory (4 pts) Satisfactory Satisfactory Improvement
(5 pts) (3 pts) (2 pts) (1 pt)
Organization
Clarity of
Message
Creativity
Total
Overall
Rating

Recommended type of Rubric:


TASK: Design a plan for an experiment showing the effect of amount of water on
plant growth.
Scoring Criterion: Completeness of Plan
Rubric:
LABEL DESCRIPTION
Outstanding All parts of the plan especially the procedure are
concisely and very satisfactorily described.
Very Good All parts are given and satisfactorily described.
Good All parts are given but with minimal description.
Fair All parts are given but without description.
Needs Improvement Parts are incomplete and without description.

SUGGESTIONS TO IMPROVE THE RELIABILITY OF SCORING


RESPONSES TO ESSAY QUESTIONS
- by Miller, Linn & Gronlund (2009, p. 243)
1. Prepare an outline of the expected answer in advance
2. Use the scoring rubric that is most appropriate
3. Decide how to handle factors that are irrelevant to the learning outcomes being
measured
4. Evaluate all responses to one question before going on to the next one
5. When possible, evaluate the answers without looking at the student’s name
6. If especially important decisions are to be based on the results, obtain two or
more independent ratings

E. Constructing Selected- Response Types

While supply formats require learners to construct their responses to questions or


directives, selected-response types entail choosing the nearly best or most correct option
to answer a problem.
a. Alternate form or binary choice provides only two options
b. Multiple-choice type offers 3 to 5 options or solutions to a problem, and
c. Matching type gives a set of problems or premises and a set of options which will be
appropriately paired

1. Binary Choice or Alternate Form


(Table)
Except for Yes-No type which uses direct questions, all other varieties of binary-choice or
alternate choice of items have propositions as the item stimulus.

2. Multiple-Choice Items
Another selected-response item format is the multiple-choice. The wide choice for this
format in classroom testing is mainly due to its versatility to assess various levels of
understanding from knowledge and simple understanding to deep understanding.
(Table)

Writing good multiple-choice items requires clarity in stating the problem in the stem and
the plausibility or attractiveness of the distracters.
Stem
1. All the words of the stem should be relevant to the task.
2. Stem should be meaningful by itself and fully contain the problem.
3. The stem should always use a question with only one correct or clearly best
answer.

Distracters
1. All distracters should appear plausible to uniformed test takers.
2. Randomly assign correct answers to alternative position.
3. Avoid using "All-of-the-above" or "None-of-the-above" as distracters.

3. Matching Types
Of the three general selected-response item formats, matching items appears differently.
It consists of two parallel lists of words or phrases the students are tasked to pair.
(Table)
Guidelines in constructing matching items
1. Keep the list premises and the list of options homogenous or belonging to a category
2. Keep the premises always in the first column and the options in the second column.
3. Keep the lists in the two columns unequal in number.
4. Test distracters always describe the basis for matching.
5. Keep the number of premises not more than eight (8) items.
6. Ambiguous lists should be avoided.

CHAPTER 9: IMPROVING A CLASSROOM-BASED


ASSESSMENT TEST

- This chapter deals with providing practical and necessary ways for improving
teacher-developed assessment tools. Popham (2011) suggest two approaches
to undertake item improvement: the judgmental approach and empirical
approach

- As with any form of written communication, it is wise to create drafts and


then review and edit them as necessary before making them final. The same
adage holds true for writing assessment items. Reviewing and editing items is
a judgment-based procedure, whether they be your own or that of others.

A.) Judgmental Item Improvement -

- This approach basically makes use of human judgment in reviewing the items.
- There are three sources of test-improvement judgments
- Teachers, Peers, and Students

Teachers’ Own Review


- It is always advisable for teachers to take a second look at the assessment tool
s/ he has devised for a specific purpose. To presume perfection right away
after its construction may lead to failure to detect shortcomings of the test or
assessment task.

- There are five suggestions given by Popham (2011) for the teachers to follow
in exercising judgment.

1. Adherence to item-specific guidelines and general item writing


commandments

- When reviewing your own items, be sure to be familiar with general item-
writing guidelines. Use them to find and fix any violations of item writing
principles.

- These guidelines should be used by the teachers to check how good the items
have been planned and written particularly in their alignment to intended
instructional outcomes

The General Item Writing Guidelines

1. Keep wording simple and focused.


2. Eliminate clues to the correct answer
3. Highlight Critical or Keywords
4. Review and double-check the scoring key.

2. Contribution to score-based inference


- The teacher examines if the expected scores generated by the test can
contribute to making valid inference about the learners. Can the scores reveal
the amount of learning achieved or show what have been mastered? Can the
score infer the student’s capability to move on to the next instructional level?
Or rather the scores obtained do not make any difference at all in describing
or differentiating various abilities.
3. Accuracy of content
- This review should especially be considered when tests have been developed
after a certain period of time. Changes that may occur due to new discoveries
or developments can redefine the test content of a summative test. If this
happens, the items or the key to correction may have to be revisited.

4. Absence of content gaps


- This review criterion is especially useful in strengthening the score-based
inference capability of the test. if the current tool misses out unimportant
content now prescribe by a new curriculum standard, the score will likely not
give an accurate description of what is expected to be assessed. The teacher
always ensures that the assessment tool matches what is currently required to
be learned. This is a way to check on the content validity of the test.

5. Fairness
- Assessment items should be free of bias. They should not favor one group nor
discriminate against another. Be attentive to any potential bias so that you can
eliminate them as much as you possibly can.

Peer review
- There are schools that encourage peer or collegial review of assessment
instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance-
based assessment tasks.

The suggestions given by test experts can actually be used collegially as


basis for a review checklist
a. Do the items follow the specific and general guidelines in writing items especially
on:

 being aligned to instructional objectives?


 making the problem clear and unambiguous?
 providing plausible options?
 avoiding unintentional clues?
 having only one correct answer?

b. Are the items free from inaccurate content?


c. Are the items free from obsolete content?
d. Are the test instructions clearly written for the students to follow?
e. Is the level of difficulty of the test appropriate to level of learners?
f. Is the test fair to all kinds of students?

Student Review
- Engagement of students in reviewing the items has become a laudable
practice for improving classroom tests. The judgment is based on the
students’ experience in taking the test, their impressions, and reactions
during the test event.

Popham’s Item Improvement Questionnaire for Students


1. If any of the items seemed confusing, which ones were they?
2. Did any items have more than one correct answer? If so, which ones?
3. Did any items have no correct answers? If so, which ones?
4. Were there words in any items that confused you? If so, which ones?
5. Were the directions for the test, or for particular subsections, unclear? If so,
which ones?

B. Empirically Based-Procedures

Item improvement using empiricially based methods is aimed at improving the quality
of an item using students’ responses to the test.

- Test developers refer to this technical process as item analysis as it utilizes


data obtained separately for each item

- An item is considered good when its quality indices i.e difficulty index and
discrimination index, meet certain characteristics

Difficulty Index
- used in the context of criterion-referenced interpretation or testing for
mastery.

- Particularly for objective tests, the responses are binary in form

- For obtaining nominal data like frequency, percentage and proportion. Useful
data then are in the form of:
a. Total number of students answering the item (T)
b. Total number of students answering the item right (R)
An item’s difficulty index is obtained by calculating p value (p) which is the proportion
of students answering the item correctly.
P= R/T
Where p is the difficulty index
R=Total number of students answering the item right
T= Total number of students answering the item

i.e
Item 1: There were 45 students in the Item 1 has p value of 0.67. Sixty seven
class who responded to item 1 and 30 (67%) got the item right while 33%
answered it correctly. missed it.
P= 30/45
=0.67
Item 2: In the same class, only 10 Item 2 has p value of 0.22. Out of 45 only
responded in item 2. 10 or 22% got the item right while 35 ot
P= 10/45 78% missed it.
=.22
FOR NORMATIVE TEST: Between the two items, item 2 appears much more difficult
item since less than a fourth of the class only was able to responded correctly.

FOR CRITERION -REFERENCED TEST: The class shows much better performance in
item 1 in item 2. It is still a long wat to master item 2

- The p-value from 0.0 to 1.0 which indicates from extremely difficult as no one got it
correctly to extremely very easy as every got it correct
- For binary-choice items, there’s a 50% probability of getting the item correct simply
by chance
- For multiple choice items of four alternatives the chance of obtaining a correct answer
by guessing is only 25%.
Discrimination Index
- The power of an item to discriminate between the informed and uninformed
groups or between more knowledgeable and less knowledgeable

- This is an item statistics that can reveal useful information for improving an
item

- Shows the relationship between the student’s performance in an item (i.e


right or wrong) and his/her performance in the test represented by the total
score.

The nature of difference can take different directions:


a. Positively discriminating item- Proportion of high scoring group is greater
than that of the low scoring group.
b. Negatively discriminating item- Proportion of high scoring group is less
than that of low scoring group.
c. Not discriminating- Proportion of high scoring group is equal to that of low
scoring group

Formula: D= Ru/Tu- RI/TI


Where D is item discrimination index

Ru = number of upper group getting the item correct


Tu = number of upper group
RI = number of lower group getting the item correct
TI= number of lower group

Another calculation can bring the same results as (Kubisyn and Borich 2010):
Where Ru = number of upper group getting the item correct
RI = number of lower group getting the item correct\
T= number of either group

Steps to Obtain the Proportions


a. Score the test papers using to correction to obtain the total scores of the students.
Maximum score is the total number of objective items.
b. Order the test papers from highest to lowest score
c. Split the test papers into halves: high group and low group
d. Obtain the p value of for the upper group and p value of the lower group

P(upper) = Ru/Tu P(lower)= RI/TI

e. Get the discrimination index by getting the difference between the p values.

Guidelines for Evaluating the Discriminating Efficiency Of Items

DISCRIMINATION INDEX ITEM EVALUATION


.40 and above Very good items
.30-.39 Reasonably good items, but possibly
subject for improvement
.20-.29 Marginal items, usually needing
improvement
.19 and below Poor items, to be rejected or improved
by revision.

Example given: Obtain the index of discrimination of an item if the upper 25%
of the class had a difficulty of 0.60
While the lower 25% of the class had a difficulty index of 0.20

DU= 0.60 while DL= 0.20, thus index of discrimination = .60 - .20 = .40

Distacter Analysis
- Used to discover for item-improvement utilizes an analysis of the distribution
of responses across the distracters. Especially when the difficulty index and
discrimination index seem to suggest it’s being candidate for revision.

Example given:
Sensitivity to Instruction Index

- Infer how sensitive an item been to instruction

- Signifies a change in student’s performance as a result of instruction

- This information is useful for criterion- referenced tests which aim at


determining if mastery learning has been attained after a designated or
prescribed instructional period.

Si will indicate if p- value obtained for the item in the post test is greater than the pre-
test.

Example given:
Consider an item where in a class of 40, 80% answered it correctly in the post while only
10% did in the pre-test.

It’s p-value for the post test is .80 while for pre-test .10 thus Si = .70 following this
calculation:
Sensitivity to Instruction (Si)= P(post) – P(pre)
= .80 - .10
= .70

You might also like