You are on page 1of 85

ASSESSMENT

IN
LEARNING 1
(Ed 312)

LEARNER’S MODULE
MARIAN COLLEGE
1st SEMESTER
SY 2021-2022

Letter to the Student

To our dear students:


Peace and all good!

Our world is experiencing an unprecedented health and economic crisis brought by COVID 19 Pandemic. This
current disruption distressed the workface across socioeconomic strata, metamorphosing the nature of the
work and the way we communicate with one another. Schools have to make adjustment in the teaching and
learning process. Flexible Learning Modality is a proposed mechanism to continue the delivery of educational
services during this period.

The Commission on Higher Education suggested three Flexible Learning Modalities; namely, online, offline.
Taking into account the availability of devices, internet connectivity, and level of digital literacy of our students,
we decided to use blended learning as our flexible mode of delivering instruction and other services. This
module designed to cater the needs of our students who do not have access to digital technology. Since it is
blended, other students have no option to avail the online component of blended learning.

You are expected to read the contents of this module, study the examples, practice answering the “Check your
progress” portion and answer the exercises at the end of every module. I expect that you will complete one
module per week. Submit your output every FRIDAY on the designated pigeonhole boxes located at the
Entrance of High School gate.

For any queries with regard to the use of this module or you encounter difficulty understanding the topic, please
do not hesitate to contact the undersigned on mobile phone number 09305171981. You can also reach me in my
messenger account Guada Edulan or send email in guadalupeedulan@mariancollege.edu.ph

I will ask for your contact details during our course orientation so that I can personally monitor your progress in
this course. In case the CHED, LGU, and IATF will allow us to conduct in-campus/face-to-face teaching and
learning, we will inform you immediately through a text message or other medium of communication. May
Almighty God and Mother Mary our patroness will bless us always.

Guadalupe G. Edulan
Instructor
P1

Chapter 1
Introduction to Assessment
In Learning

Lesson 1: Basic Concepts and Principles in Assessing Learning

To successfully describe the nature of assessment in learning, develop a concept map of its basic concepts and document
the experiences of teachers who apply its principles. To do so, you need to read the following information about the basic concepts,
measurement frameworks, and principles in assessing learning. You are expected to read this information before discussion,
analysis, and evaluation when you meet the teacher face-to-face or in your virtual classroom. If the information provided in this
worktext is not enough, you can search for more information in the internet.

In this lesson, you are expected to:


 Describe assessment in learning and related concepts and
 Demonstrate understanding of the different principles in assessing learning through the preparation of an assessment
plan.

WHAT IS ASSESSMENT IN LEARNING?


The word assessment is rooted in the Latin word assidere, which means “to sit beside another.” Assessment is generally
defined as the process of gathering quantitative and/or qualitative data for the purpose of making decisions. Assessment in learning
is a vital to the educational process similar to curriculum and instruction. Schools and teachers will not be able to determine the
impact of curriculum and instruction on students or learners without assessing learning. Therefore, it is important that educators
have knowledge and competence in assessing learners.
Assessment in Learning can be defined as the systematic and purpose-oriented collection, analysis, and interpretation of
evidence of student learning in order to make informed decisions relevant to the learners. In essence, the aim of assessment is to
use evidence on student learning to further promote and manage learning. Assessment in learning can be characterized as (a) a
process, (b) based on specific objectives, and (c) from multiple sources.

P2
How is assessment in learning similar or different from the concept of measurement or evaluation of learning?
Measurement can be defined as the process of quantifying the attributes of an object, whereas evaluation may refer to the process
of making value judgments on the information collected from measurement based on specified criteria. In the context of assessment
in learning, measurement refers to the actual collection of information on student learning through the use of various strategies and
tools, while evaluation refers to the actual process of making a decision or judgment on student learning based on the information
collected from measurement. Therefore, assessment can be considered as an umbrella term consisting of measurement and
evaluation. However, there are some authors who consider assessment as distinct and separate from evaluation and measurement
(e.g., Huba and Freed 2000, Popham 1998).

Assessment and Testing


The most common form of assessment is testing. In the educational context, testing refers to the use of a test or battery of
tests to collect information on student learning over a specific period of time. A test is a form of assessment, but not all assessment
use tests or testing. A test can be categorized as either a selected response (e.g., matching-type of test) or constructed response (e.g.,
essay test, short answer test). A test can make use of objective format (e.g., multiple choice, enumeration) or subjective format (e.g.,
essay). The objective format provides for a more bias-free scoring as the test items have exact correct answers. On the other hand,
the subjective format allows for a less objectives means of scoring, especially if no rubric is used. A table of specification (TOS) - a
table that maps out the essential aspects of a test (e.g., test objectives, contents, topics covered by the test, item distribution) - is
used in the design and development of a tests. Descriptive statistics are typically used to describe and interpret the results of tests. A
test is said to be good and effective if it has acceptable psychometric properties. This means that a test should be valid, reliable, has
acceptable level of difficulty, and can discriminate between learners with higher and lower ability. Teachers are expected to be
competent in the design and development of classroom tests.

Assessment and Grading


A related concept to assessment in learning is grading, which can be defined as the process of assigning value to the
performance or achievement of a learner based on specified criteria or standards. Aside from tests, other classroom tasks can serve
as bases for grading learners. These may include a learner’s performance in recitation, seatwork, homework, and project. The final
grade of a learner in a subject or course is the summation of information from multiple sources (i.e., several assessment tasks or
requirements). Grading is a form of evaluation which provides information on whether a learner passed of failed a subject or a
particular assessment task. Teachers are expected to be competent in providing performance feedback and communicating the
results of assessment tasks or activities to relevant stakeholders.

What are the different measurement frameworks used in assessment?


The two most common psychometric theories that serve as frameworks for assessment and measurement, especially in the
determination of the psychometric characteristics of a measure (e.g., tests, scale), are the classical test theory (CTT) and the item
response theory (IRT).

The CTT, also known as the true score theory, explains that variations in the performance of examinees on a given measure
is due to variations in their abilities. The CTT assumes that an examinee’s observed score in a given measure is the sum of the
examinee’s true score and some degree of error in the measurement caused by some internal and external conditions. Hence, the
CTT also assumes that all measures are imperfect, and the scores obtained from a measure could differ from the true score (i.e., true
ability) of an examinee.
The CTT provides an estimation of the item difficulty based on the frequency or number of examinees who correctly answer
a particular item; items with fewer number of examinees with correct answer are considered more difficult. The CTT also provides an
estimation of item discrimination based on the number of examinees with higher or lower ability to answer a particular item. If an
item is able to distinguish between examinees with higher ability (i.e., higher total test score) and lower ability (i.e., lower test
score), then an item is considered to have good discrimination. Test reliability can also be estimated using approaches from CTT
(e.g., Kunder-Richardson 20, Cronbach’s alpha). Item analysis based on CTT has been the dominant approach because of the
simplicity of calculating the statistics (e.g., item difficulty index, item discrimination index, item-total correlation).
The IRT, on the other hand, analyzes test items by estimating the probability that an examinee answers an item correctly or
incorrectly. One of the central differences of IRT from CTT is that IRT, it is assumed that the characteristic of an item can be
estimated independently of the characteristic or ability of the examinee and vice-versa. Aside from item difficulty and item
discrimination indices, IRT analysis can provide significantly more information on items and tests, such as fit statistics, item
characteristic curve (ICC), and test characteristic curve (TCC). There are also different IRT models (e.g., one-parameter model, three-
parameter model) which can provide different item and test information that cannot be estimated using the CTT. In previous years,
there have been an increase in the use of IRT analysis as measurement framework despite the complexity of the analysis involved
due to the availability of IRT software.
P3
What are the different types of assessment in learning?
Assessment in learning could be of different types. The common types are formative, summative, diagnostic, and
placement. Other experts would describe the types of assessment as traditional and authentic.

Formative Assessment refers to assessment activities that provide information to both teachers and learners on how they
can improve the teaching learning process. This type of assessment is formative because it is used at the beginning and during
instruction for teachers to assess learners’ understanding. The information collected on the student learning allows teachers to
make adjustments to their instructional process and strategies to facilitate learning. Through performance reports and teacher
feedback, formative assessment can also inform learners about their strengths and weaknesses to enable them to take steps to
learn better and improve their performance as the class progresses.

Summative Assessment are assessment activities that aim to determine learners’ mastery of content or attainment of
learning outcomes. They are summative, as they are supposed to provide information on the quantity or quality of what students
have learned or achieved at the end of instruction. While data from summative assessment are typically used for evaluating learners’
performance in class, these data also provide teachers with information about the effectiveness of their teaching strategies and how
they can improve their instruction in the future. Through performance reports and teacher feedback, summative assessment can
also inform learners about what they have done well and what they need to improve on in their future classes or subjects.

Diagnostic Assessment aims to detect the learning problems or difficulties of the learners so that corrective measures or
interventions are done to ensure learning. Diagnostic assessment is usually done right after seeing signs of learning problems in the
course of teaching. It can also be done at the beginning of the school year for spirally-designed curriculum so that corrective actions
are applied if pre-requisite knowledge and skills for the targets of instructions have not been mastered yet.

Placement Assessment is usually done at the beginning of the school year to determine what the learners already know or
what are their needs that could inform design of instruction. Grouping of learners based on the results of placement assessment is
usually done before instruction to make it relevant to address the needs or accommodate the entry performance of the learners.
The entrance examination given in schools is an example of a placement assessment.

Traditional Assessment refers to the use of conventional strategies or tools to provide information about the learning of
students. Typically, objective (e.g., multiple choice) and subjective (e.g., essay) paper-and-pencil tests are used. Traditional
assessments are often used as basis for evaluating and grading learners. They are commonly used in classrooms because they are
easier to design and quicker to be scored. In general, traditional assessments are viewed as an inauthentic type of assessment.

Authentic Assessment refers to the use of the assessment strategies or tools that allow learners to perform or create a
product that are meaningful to the learners, as they are based on the real-world contexts. The authenticity of assessment tasks is
best described in terms of degree rather than the presence or absence of authenticity. Hence, an assessment can be more authentic
or less authentic compared with other assessments. The most authentic assessments are those that allow performances that most
closely resemble real-world tasks or applications in real-world settings or environment.

What are the different principles in assessing learning?


There are many principles in the assessment in learning. Based on the different readings and references on these principles,
the following may be considered as core principles:

1. Assessment should have a clear purpose. Assessment starts with a clear purpose. The methods used in collecting
information should be based on this purpose. The interpretation of the data collected should be aligned with the purpose
that has been set. This assessment principle is congruent with the outcome-based education (OBE) principles of clarity of
focus and design down.

2. Assessment is not an end in itself. Assessment serves as a means to enhance student learning. It is not a simple recording
or documentation of what learners know and do not know. Collecting information about student learning, whether
formative or summative, should lead to decisions that will allow improvement of the learners.

3. Assessment is an ongoing, continuous, and a formative process. Assessment consists of a series of tasks and activities
conducted over time. It is not a one-shot activity and should be cumulative. Continuous feedback is an important element
of assessment. This assessment principle is congruent with the OBE principle of expanded opportunity.
P4
4. Assessment is learner-centered. Assessment is not about what the teacher does but what the learner can do. Assessment
of learners provides teachers with an understanding on how they can improve their teaching, which corresponds to the goal
of improving student learning.

5. Assessment is both process- and product-oriented. Assessment gives equal importance to learner performance or product
and the process they engage in to perform or produce a product.

6. Assessment must be comprehensive and holistic. Assessment should be performed using a variety of strategies and tools
designed to assess student learning in a holistic way. Assessment should be conducted in multiple periods to assess learning
over time. This assessment principle is also congruent with the OBE principle of expanded opportunity.

7. Assessment requires the use of appropriate measures. For assessment to be valid, the assessment tools or measures used
must have sound psychometric properties, including, but not limited to, validity and reliability. Appropriate measures also
mean that learners must be provided with challenging but age- and context-appropriate assessment tasks. This assessment
principle is consistent with the OBE principle of high expectations.

8. Assessment should be as authentic as possible. Assessment tasks or activities should closely, if not fully, approximate real-
life situations or experiences. Authenticity of assessment can be thought of as a continuum from least authentic to most
authentic, with more authentic tasks expected to be more meaningful for learners.

P5

Week 1- a
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.

 DEVELOP

To determine whether you have acquired the needed information about the basic concepts and principles in assessment,
use the space provided to draw a metaphor (i.e., any object, thing, or action you could liken assessment to) that will visually
illustrate what is assessment in learning. Everyone will share and discuss the metaphors they have drawn in class.

EXAMPLE: A thermometer can be drawn as a metaphor for assessment if you consider measurement or collection of information
from a person (i.e., student) as central in the assessment process. A thermometer is a device that collects information about a
person’s body temperature, which provides information on whether a person’s body temperature is normal or not (i.e., high
temperature could be a symptom of fever). The information is then used by medical personnel to make decisions relative to the
collected information. This is similar to the process of assessment.
P6

Application week 1- b
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________

Instructions: a.) Using other bond papers/papers strictly prohibited.

 APPLY
Based on the lessons on the basic concepts and principles in assessment in learning, select five core principles in assessing
learning and explain them in relation to your experience with a previous or current teacher in one of your courses/subjects.
EXAMPLE:
PRINCIPLE ILLUSTRATION OF PRACTICE
 Assessment should be as authentic In our practicum course, we were asked to prepare a lesson plan then execute the
as possible plan in front of the students with my critic teacher around to evaluate my
performance. The actual planning of the lesson and its execution in front of the class
and the critic teacher is a very authentic way of assessing my ability to design and
deliver instruction rather than being assessed through demonstration in front of my
classmates in the classroom.
Given the example, continue the identification of illustrations of assessment practices guided by the principles discussed in
the class.
Share your insights on how your teacher’s assessment practices allowed you to improve your learning.
Principle Illustration of Practice
1.

2.

3.

4.

5.

P7
Quiz #1
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________

Instructions: a.) Using other bond papers/papers strictly prohibited.


b.) For each item, circle the option that corresponds to what you think is the best answer.

1. Which of the following is TRUE about measurement and evaluation?


A. Measurement and evaluation involve collection of information
B. Measurement and evaluation are part of the assessment process.
C. Measurement and evaluation require the use of tests.
D. Measurement and evaluation are similar processes.

2. Which of the following assessment tasks is the LEAST AUTHENTIC?


A. Essay test
B. Field demonstration
C. Multiple-choice test
D. Research project

3. Assessment is not about what the teacher does but what the learner can do.
This statement is most reflective of which principle of assessment?
A. Assessment should be as authentic as much as possible.
B. Assessment should have a clear purpose.
C. Assessment is not an end in itself.
D. Assessment is learner-centered.

4. Which of the following statement about assessment is NOT TRUE?


A. Assessment is systematic and purpose-oriented.
B. The word assessment is rooted in the Latin word assidere.
C. A test is a form of assessment, but not all assessments use test or testing.
D. Assessment is the process of assigning a numerical score to the performance
of a student.

5. Assessment should have a clear purpose. If you are already a classroom teacher,
how would you best demonstrate or practice this assessment principle?
A. Discuss with the class the grading system and your expectations of your students’ performance.
B. When giving tests, the purpose of each test is provided in the first page of the test paper.
C. Explain during the first day of classes your assessment techniques and your reasons for their use.
D. When deciding on an assessment task, its match and consistency with instructional objectives and
learning targets are ascertained.
P8

Lesson 2: Assessment Purposes, Learning Targets, and


Appropriate Methods

To be able to successfully prepare an assessment plan based on learning targets, you need to read the following
information about the purposes of assessing learning in the classroom, the basic qualities of effective classroom assessment,
learning targets, and use of appropriate assessment methods. You are expected to read this before discussion, analysis, and
evaluation when you meet the teacher face-to-face in your classroom.

In this lesson,
you are expected to:
 explain the purpose of classroom assessment and
 formulate learning targets that match appropriate assessment methods.
9.

What is the purpose of classroom assessment?


Assessment works best when its purpose is clear. Without a purpose, it is difficult to design or plan assessment effectively
and efficiently. In classrooms, teachers are expected to know the instructional goals and learning outcomes, which will inform how
they will design and implement their assessment. In general, the purpose of classroom assessment may be classified in terms of the
following:
1. Assessment of Learning. This refers to the use of assessment to determine learners’ acquired knowledge and skills from
instruction and whether they were able to achieve the curriculum outcomes. It is generally summative in nature.
2. Assessment for Learning. This refers to the use of assessment to identify the needs of learners in order to modify
instruction or learning activities in the classroom. It is formative in nature and it is meant to identify gaps in the learning
experiences of learners so that they can be assisted in achieving the curriculum outcomes.
3. Assessment as Learning. This refers to the use of assessment to help learners become self-regulated. It is formative in
nature and meant to use assessment tasks, results, and feedback to help learners practice self-regulation and make
adjustments to achieve the curriculum outcomes.

As discussed in the previous lesson, assessment serves as the mechanism by which teachers are able to determine whether
instruction worked in facilitating the learning of students. Hence, it is very important that assessment is aligned with instruction and
the identified learning outcomes for learners. Knowing what will be taught (curriculum content, competency, and performance
standards) and how it will be taught (instruction) are as important as knowing what we want from the very start (curriculum
outcome) in determining the specific purpose and strategy for assessment. The alignment is easier if teachers have clear purpose on
what they are performing the assessment. Typically, teachers use classroom assessment for assessment of learning more than
assessment for learning and assessment as learning. Ideally, however, all three purposes of classroom assessment must be used.
While it is difficult to perform an assessment with all three purposes in mind, teachers must be able to understand the three
purposes of assessment, including knowing when and how to use them.

The Roles of Classroom Assessment in the Teaching-Learning Process


Assessment is an integral part of the instructional process where teachers design and conduct instruction (teaching), so
learners achieve the specific target learning outcomes defined by the curriculum. While the purpose of assessment may be classified
as assessment of learning, assessment for learning, and assessment as learning, the specific purpose of an assessment depends on
the teacher’s objective in collecting and evaluating assessment data from learners. More specific objectives for assessing student
learning are congruent to the following roles of classroom assessment in the teaching-learning process: formative, diagnostic,
facilitative, and motivational, each of which is discussed below.

Formative. Teachers conduct assessment because they want to acquire information on the current status and level of
learners’ knowledge and skills or competencies. Teachers may need information (e.g., prior knowledge, strengths) about
the learners prior to instruction, so they can design their instructional plan to better suit the needs of the learners. Teachers
may also need information on learners during instruction to allow them to modify instruction or learning activities to help
learners achieve the learning outcomes. How teachers should facilitate students’ learning may be informed by the
information that may be acquired in the assessment results.
Diagnostic. Teachers can use assessment to identify specific learners’ weaknesses or difficulties that may affect their
achievement of the intended learning outcomes. Identifying these weaknesses allows teachers to focus on specific learning
needs and provide opportunities for instructional intervention or remediation inside or outside the classroom. The
diagnostic role of assessment may also lead to differentiated instruction or even individualized learning plans when deemed
necessary.
Evaluative. Teachers conduct assessment to measure learners’ performance or achievement for the purposes of making
judgment or grading in particular. Teachers need information on whether the learners have met the intended learning
outcomes after the instruction is fully implemented. The learners’ placement or promotion to the next educational level is
informed by the assessment results.
Facilitative. Classroom assessment may affect student learning. On the part of teachers, assessment for learning provides
information on students’ learning and achievement that teachers can use to improve instruction and the learning
experiences of learners. On the part of learners, assessment as learning allows them to monitor, evaluate, and improve
their own learning strategies. In both cases, student learning is facilitated.
Motivational. Classroom assessment can serve as mechanism for learners to be motivated and engaged in learning and
achievement in the classroom. Grades, for instance, can motivate and demotivate learners. Focusing on progress, providing
effective feedback, innovating assessment tasks, and using scaffolding during activities provide opportunities for
assessment to be motivating rather than demotivating.
10.
What are learning targets?

Educational Goals, Standards, and Objectives


Before discussing what learning targets are, it is important to first define educational goals, standards, and objectives.
Goals. Goals are general statements about desired learner outcomes in a given year or during the duration of a program
(e.g., senior high school).
Standards. Standards are specific statement about what learners should know and are capable of doing at a particular
grade level, subject, or course. McMillan (2014, p 31) described four different types of educational standards: (1) content
(desired outcomes in a content area), (2) performance (what students do to demonstrate competence), (3) developmental
(sequence of growth and change over time), and (4) grade-level (outcomes for a specific grade).
Educational Objectives. Educational Objectives are specific statements of learner performance at the end of an instructional
unit. These are sometimes referred to as behavioral objectives and are typically stated with the use of verbs. The most
popular taxonomy of educational objectives is Bloom’s Taxonomy of Educational Objectives.

The Bloom’s Taxonomy of Educational Objectives


Bloom’s Taxonomy consists of three domains: cognitive, affective, and psychomotor. These three domains correspond to
the three types of goals that teachers want to assess: knowledge-based goals (cognitive), skill-based goals (psychomotor), and
affective goals (affective). Hence, there are three taxonomies that can be used by teachers depending on the goals. Each taxonomy
consists of different levels of expertise with varying degrees of complexity. The most popular among the three taxonomies is the
Bloom’s Taxonomy of Educational Objectives in the Cognitive Domain, also known as Bloom’s Taxonomy of Educational Objectives
for Knowledge-Based Goals. The taxonomy describes six levels of expertise: knowledge, comprehension, application, analysis,
synthesis, and evaluation. Table 2.1 presents the description, illustrative verbs, and a sample objective for each of the six levels.

Table 2.1 Bloom’s Taxonomy of Educational Objectives in the Cognitive Domain


Cognitive Level Description Illustrative Verbs Sample Objective
Knowledge Recall or recognition of learned materials Defines, recalls, Enumerate the six levels of expertise
like concepts, events, facts, ideas, names, enumerates, in the Bloom’s taxonomy of objectives
and procedures and labels in the cognitive domain.
Comprehension Understanding the meaning of a Explains, describes, Explain each of the six levels of
learned material, including interpretation, summarizes, discusses expertise in the Bloom’s taxonomy of
explanation, and literal translation and translates objectives in the cognitive domain
Application Use of abstract ideas, principles, or Applies, demonstrates, Demonstrate how to use Bloom’s
methods to specific concrete situations produces, illustrates, taxonomy in formulating learning
and uses objectives.
Separation of a concept or idea into Compares, contrasts, Compare and contrasts the six levels
Analysis constituent parts or elements and an categorizes, classifies of expertise in Bloom’s taxonomy of
understanding of the nature and association and calculates objectives in the cognitive domain.
among the elements
Synthesis Construction of elements or parts from Composes, constructs, Compose learning targets using
different sources to form a more complex creates, designs, Bloom’s taxonomy.
or novel structure and integrates
Evaluation Making judgment of ideas or Appraises, evaluates, Evaluate the congruence between
methods based on sound and judges, concludes, learning targets and assessment
established criteria And criticizes methods.

Bloom’s taxonomies of educational objectives provide teachers with a structured guide in formulating more specific learning targets
as they provide an exhaustive list of learning objectives. The taxonomies do not only serve as guide for teachers’ instruction but also
as a guide for their assessment of students learning in the classroom. Thus, it is imperative that teachers identify the levels of
expertise that they expect the learners to achieve and demonstrate. This will then inform the assessment method required to
properly assess student learning. It is assumed that a higher level of expertise in a given domain requires more sophisticated
assessment methods or strategies.

The Revised Bloom’s Taxonomy of Educational Objectives


Anderson and Krathwohl proposed a revision of the Bloom’s Taxonomy in the cognitive domain by introducing a two-
dimensional model for writing learning objectives (Anderson and Krathwohl, 2001). The first dimension, knowledge dimension,
includes four types: conceptual, procedural, and metacognitive.
11.
The second dimension, cognitive process dimension, consists of six types: remember, understand, apply, analyze, evaluate,
and create. An educational or learning objective formulated from this two-dimensional model contains a noun (type of knowledge)
and a verb (type of cognitive process). The Revised Bloom’s Taxonomy provides teachers with a more structured and more precise
approach in designing and assessing learning objectives.

Below is an example of a learning objective:


Students will be able to differentiate qualitative research and quantitative research.
In the example, differentiate is the verb that represents the type of cognitive process (in this case, analyze), while
qualitative research and quantitative research is the noun phrase that represents the type of knowledge (in this case, conceptual).
Table 2.2 and 2.3 present the definition, illustrative verbs, and sample objectives of the cognitive process dimensions and
knowledge dimensions of the Revised Bloom’s Taxonomy.

Table 2.2 Cognitive Process Dimensions in the Revised Bloom’s Taxonomy of Educational Objectives
Cognitive Level Description Illustrative Verbs Sample Objective
Create Combining parts to Compose, produce, develop, formulate, Propose a program of action to
make a whole devise, prepare, design, construct, help solve Metro Manila’s traffic
propose, and re-organize congestion.
Evaluate Judging the value Assess, measure, estimate, Critique the latest film that you have
of information or data evaluate, critique and judge. watched. Use the critique guidelines
and format discussed in the class.
Analyze Breaking down Analyze, calculate, examine, test, Classify the following chemical
information into parts compare, differentiate, organize, elements based on some
and classify categories/areas.
Apply Applying the facts, rules, Apply, employ, practice, relate, Solve the following problems using
concepts, and ideas in use, implement, carry-out, the different measures of central
another context and solve tendency.
Understand Understanding what the Describe, determine, interpret, translate, Explain the causes of malnutrition in
information means paraphrase, and explain the country.
Remember Recognizing and recalling Identify, list, name, underline, Name the 7th president of the
facts recall, retrieve, locate Philippines

Table 2.3 Knowledge Dimensions in the Revised Bloom’s Taxonomy of Educational Objectives
Knowledge Description Sample Question
Factual This type of knowledge is basic in every discipline. It tells the facts or bits of What is the capital city of the
information one needs to know in a discipline. This type of knowledge Philippines?
usually answers questions that begin with “who”, “where”, and “when”.
Conceptual This is type of knowledge is also fundamental in every discipline. It tells the What makes the Philippines the
concepts, generalizations, principles, theories, and models that one needs “Pearl of the orient sea”?
to know in a discipline. This type of knowledge usually answers questions
that begin with “what”.
Procedural This type of knowledge is also fundamental in every discipline. It tells the How do we develop items for an
processes, steps, techniques, methodologies, or specific skills needed in achievement test?
performing a specific task that one needs to know and be able to do in a
discipline. This type of knowledge usually answers questions that begin with
“how”.
Metacognitive This type of knowledge makes the discipline relevant to one’s life. It makes Why is Engeenering the most
one understand the value of learning on one’s life. It requires reflective suitable course for you?
knowledge and strategies on how to solve problems or perform a cognitive
task through understanding of oneself and context. This type of knowledge
usually answers questions that begin with “why”. Questions that begin with
“how” and what could be used if they are embedded in a situation that one
experiences in real life.
12.
Learning Targets
A learning target is “a statement of student performance for a relatively restricted type of learning outcome that will be
achieved in a single lesson or a few days” and contains “both a description of what students should know, understand, and be able
to do at the end of instruction and something about the criteria for judging the level of performance demonstrated” (McMillan
2014). In the words, learning targets are statements on what learners are supposed to learn and what they can do because of
instruction. Compared with educational goals, standards, and objectives, learning targets are the most specific and lead to more
specific instructional and assessment activities.

Learning targets should be congruent with the standards prescribes by program or level and aligned with the instructional
or learning objectives of a subject or course. Teachers must inform learners about the learning targets of lessons prior to classroom
instruction. The learning targets should be meaningful for the learners; hence, they must be as clear and as specific as possible. It is
suggested that learning targets be stated in the learners’ point of view, typically using the phrase “I can ….” For example, “I can
differentiate between instructional objectives and learning targets.”

With clear articulation of learning targets, learners will know what they are expected to learn during the lesson or set of
lessons. Learning targets will also inform learners what they should be able to do or demonstrate as evidence of their learning. Both
classroom instruction and assessment should be aligned with the specified learning targets of a lesson.

McMillan (2014) proposed five criteria for selecting learning targets: (1) establish the right number of learning targets (Are
there too many or too few targets?); (2) established comprehensive learning targets (Are all important types of learning included?);
(3) established learning targets that reflect school goals and 21 st century skills (Do the targets reflect school goals and 21 st century
knowledge, skills, and dispositions?); (4) established learning targets that are challenging yet feasible (Will the targets challenge
students to do their best work?); and (5) establish learning targets that are consistent with current principles of learning and
motivation (Are the targets consistent with research on learning and motivation?).

Types of Learning Targets


Many experts consider four primary types of learning targets: knowledge, reasoning, skill, and product.
Table 2.4 summarizes these types of learning targets.

Table 2.4 Description and Sample Learning Targets


Types of Learning Targets Description Sample
Knowledge targets Refers to factual, conceptual, and procedural I can explain the role of conceptual
information that learners must learn in a subject or framework in a research.
content area
Reasoning targets Knowledge-based thought processes that learners must I can justify my research problems with
learn. It involves application of knowledge in problem- a theory.
solving, decision-making, and other tasks that require
mental skills.
Skills targets Use of knowledge and/or reasoning to I can facilitate a focus group discussion
perform or demonstrate physical skills (FGD) with research participants.
Product targets Use of knowledge, reasoning, and skills in creating a I can write a thesis proposal.
concrete or tangible product

Other experts consider a fifth type of learning targets—affect, which refers to affective characteristics that students can develop and
demonstrate because of instruction. This includes attitudes, beliefs, interests, and values. Some experts use disposition as an
alternative term for affect. The following is an example of an affect or disposition learning target:

I can appreciate the importance of addressing potential ethical issues in the conduct of thesis research.

Appropriate Method of Assessment


Once the learning targets are identified, identified, appropriate assessment methods can be selected to measure student
learning. The match between a learning target and the assessment method used to measure if students have met the target is very
critical. Tables 2.5.1 and 2.5.2 present a matrix of the different types of learning targets and sample assessment methods.
13.
Table 2.5.1 Matching Learning Targets with Paper-and-Pencil Types of Assessment
Learning Selected Response Constructed Response
Targets Multiple True or False Matching Short Problem- Essay
Choice Type Answer Solving
Knowledge √√√ √√√ √√√ √√√ √√√ √√√
Reasoning √√ √ √ √ √√√ √√√
Skills √ √ √ √ √√ √√
Products √ √ √ √ √ √

Note: More checks mean better matches

Table 2.5.2 Matching Learning Targets with Other Types of Assessment


Learning Targets Project-based Portfolio Recitation Observation
Knowledge √ √√√ √√√ √√
Reasoning √√ √√ √√√ √√
Skills √√ √√√ √ √√
Product √√√ √√√ √ √

Note: More checks mean better matches

There are other types of assessment, and it is up to the teachers to select the method of assessment and design
appropriate assessment tasks and activities to measure the identified learning targets.
P14

Week 2

Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.

 DEVELOP
To know if you have acquired the information you need to learn in this lesson, kindly complete Tables 2.6 and 2.7.

Table 2.6 General Purpose of Classroom Assessment


Assessment of Assessment for Assessment as
Learning Learning Learning

What?

Why?

When?

Table 2.7 Relation between Educational Goals, Standards, Objectives, and Learning Targets
Goals Standards Objectives Learning
Targets
Description

Sample
Statements
P15

Quiz #2
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________

Instructions: a.) Using other bond papers/papers strictly prohibited.


b.) For each item, circle the option that corresponds to what you think is the best answer.

1. What is this purpose of assessment that aims to identify students’ needs to inform instruction?
A. Assessment as Learning
B. Assessment for Learning
C. Assessment of Learning
D. Assessment with Learning

2. Use the internet in performing search for related research literature.


The aforementioned learning objective is an example of which type of cognitive learning outcome in the
Revised Bloom’s Taxonomy?
A. Applying
B. Understanding
C. Knowledge
D. Creating

3. Explain the difference between learning targets and instructional objectives.


The aforementioned learning target is an example of which type of learning target?
A. Knowledge
B. Product
C. Reasoning
D. Skills

4. Which of the following types of pen-and-paper test is best matched with reasoning type of learning targets?
A. Essay
B. Matching-Type
C. Multiple-Choice
D. Short-Answer

5. If you are a values education teacher who intends to design an assessment task to determine your learners’
motivation in practicing pro-environmental behaviors, which of the following assessment strategies would
best addresses your purpose?
A. Learners developing and producing a video of their pro-environmental advocacy
B. Learners answering an essay question on “Why Pro-environmental Behavior Matters?”
C. Learners writing individual blogs on their pro-environmental activities and why they do it.
D. Learners conducting an action research on students’ motivation in pro-environmental behaviors.
P16.

Lesson 3: Different Classification of Assessment

In order to plan, create, and select the appropriate kind of assessment, you need to know the characteristics of the
different types of assessment according to purpose, function, and the kind of information needed about learners. You are
expected to read this before you can create your own illustrative scenario.

In this lesson, you are expected to:


o Illustration scenarios in the use of different classifications of assessment;
o Rationalize the purpose of different forms of assessment; and
o Decide on the kind of assessment to be used.
17.
What are the different classifications of assessment?
The different forms of assessment are classified according to purpose, form, interpretation of learning, function, ability, and
kind of learning.
Classification Type
Purpose Educational
Psychological
Form Paper-and-Pencil
Performance-based
Function Teacher-made
Standardized
Kind of learning Achievement
Aptitude
Ability Speed
Power
Interpretation of Learning Norm-referenced
Criterion-referenced

When do we use educational and psychological assessments?


Educational assessments are used in the school setting for the purpose of tracking the growth of learners and grading their
performance. This assessment in the educational setting comes in the form of formative and summative assessment. These work
hand-in-hand to provide information about student learning. Formative assessment is a continuous process of gathering information
about student learning at the beginning, during, and after instruction so that teachers can decide how to improve their instruction
until learners are able to meet the learning targets. When the learners are provided with enough scaffold as indicated by the
formative assessment, then the summative assessment is conducted. The purpose of summative assessment is to determine and
record what the learners have learned. On the other hand, the purpose of formative assessment is to track and monitor student
learning and their progress toward the learning target. Formative assessment can be any form of assessment (paper-and-pencil or
performance-based) that is conducted before, during, and after instruction. Before instruction begins, formative assessment serves
as a diagnostic tool to determine whether learners already know about the learning target. More specifically, formative assessment
given at the start of the lesson determines the following:
1. What learners know and do not know so that instruction can supplement what learners do not know.
2. Misconceptions of learners so that they can be corrected
3. Confusion of learners so that they can be clarified
4. What learners can and cannot do so that enough practice can be given to perform the task

The information from educational assessment at the beginning of the lesson is used by the teacher to prepare relevant
instruction for learners. For example, if the learning target is for learners to determine the by-product of photosynthesis, then the
teacher can ask learners if they know what the food of plants is. If incorrect answers are provided, then the teacher can recommend
references for them to study. If the learning target is for the learners to divide a three-digit number by a two-digit number, then the
teacher can start with a three-item exercise on the task to identify who can and cannot perform the task. For those who can do the
task, the teacher can provide more exercises; for those who cannot, necessary direct instruction can be provided. At this point of
instruction, the results of the assessment are not graded because the information is used by the teacher to prepare a relevant ways
to teach.
Educational assessment during instruction is done where the teacher stops at certain parts of the teaching episodes to ask
learners questions, assign exercises, short essays, board work, and other tasks. If the majority of the learners are still unable to
accomplish the task, then the teacher realizes that further instruction is needed by learners. The teacher continuously provides a
series of practice drills and exercises until the learners are able meet the learning target. These drills and exercises are meant to
make learners consolidate the skill until they can execute it with ease. At this point of the instruction, the teacher should be able to
see the progress of the learners in accomplishing the task. The teacher can require the learners to collect the results of their drills
and exercises so that learners can track their own progress as well. This procedure allows learners to become active participants in
their own learning. At this point of the instruction, the results of assessment are not yet graded because the learners are still in the
process of reaching the learning target; and some learners do not progress at the same rate as the others.
When the teacher observes that majority or all of the learners are able to demonstrate the learning target, then the teacher
can now conduct the summative assessment. It is best to have a summative assessment for each learning target so that there is an
evidence that learning has taken place. Both the summative and formative assessment should be aligned to the same learning
target; in this case, there should be parallelism between the tasks provided in the formative and summative assessment.
18.
When the learners are provided with word problem-solving tasks in the summative assessments, word problem-solving
should have also be given during the formative assessment. When the learners are asked to identify the parts of the book during the
summative assessment, the same exercises should have been provided during the formative assessment. For physical education, if
the final performance is a folk dance, then learners are given time to practice and a pre-final performance is scheduled to give
feedback. The final dance performance is the summative assessment, and the time for practice and pre-final performance is the
formative assessment.
Psychological assessments, such as tests and scales, are measures that determine the learner’s cognitive and non-cognitive
characteristics. Examples of cognitive tests are those that measure ability, aptitude, intelligence, and critical thinking. Affective
measures are for personality, motivation, attitude, interest, and disposition. The results of these assessments are used by the
school’s guidance counselor to perform interventions on the learners’ academic, career, and social and emotional development.

When do we use paper-and-pencil and performance-based type of assessments?


Paper-and-pencil type of assessments are cognitive tasks that require a single correct answer. They usually come in the
form of test types, such as binary (true or false), short answer (identification), matching type, and multiple choice. The items usually
pertain to a specific cognitive skill, such as recalling, understanding, applying, analyzing, evaluating, and creating. On the hand,
performance-based type of assessments require learners to perform tasks, such as demonstrations, arrive at a product, show
strategies, and present information. The skills applied are usually complex and require integrated skills to arrive at the target
response. Examples include writing an essay, reporting in front of the class, reciting a poem, demonstrating how a problem was
solved, creating a word problem, reporting the results of an experiment, dance and song performance, painting and drawing, playing
a musical instrument, etc. Performance-based tasks are usually open-ended, and each learner arrives with various possible
responses.
The use of paper-and-pencil and performance-based tasks depends on the nature and content of the learning target. Below
are examples of learning targets that require a paper-and-pencil type of assessment:
 Identify the parts of the plants
 Label the parts of the microscope
 Compute the compound interest
 Classify the phase of a given matter
 Provide the appropriate verb in the sentence
 Identify the type of sentence
Below are learning targets that require performance-based assessment:
 Varnish a wooden cabinet
 Draw a landscape using paintbrush in the computer
 Write a word problem involving multiplication of polynomials
 Deliver a speech convincing your classmates that you are a good candidate for the student council
 Write an essay explaining how humans and plants benefit from each other
 Mount a plant specimen on a glass side

How do we distinguish teacher-made from standardized test?


Standardized tests have fixed directions for administering and scoring. They can be purchased with test manuals, booklets,
and answer sheets. When these tests were developed, the items were sampled on a large number of target groups called the norm.
The group’s performance is used to compare the results of those who took the test.

Category Specific Example Visit the site for their description


Intelligence Wechsler Adult https://wechslertest.com/
test Intelligence Scale
Achievement Metropolitan https://www.tests.com/MAT-8-Testing
test Achievement Test
Aptitude Raven’s http://www.pearsonclinical.co.uk/Psychological/
test Progressive AdultCognitionNeuropsychologyandLanguage/AdultGeneralAbilities/Ravens-
Matrices Progressive-Matrices/Ravens-Progressie-Matrices.aspx
Critical Watson Glaser Critical http://www.assessmentday.co.uk/watson-glaser-critical-thinking.htm
thinking test Thinking Appraisal
Interest test RIASEC Markers Scale http://openpsychometric.org/tests/RIASEC/
Personality NEO Personality http://www.hogrefe.com.uk/neopir.html
test Inventory
19.
Non-standardized or teacher-made tests are usually intended for classroom assessment. They are used for classroom
purposes, such as determining whether learners have reached the learning target. These intend to measure behavior (such as
learning) in line with the objectives of the course. Examples are quizzes, long tests, and exams. Formative and summative
assessments are usually teacher-made tests.

Can a teacher-made test become a standardized test? Yes, as long as it is valid, reliable, and with a standard procedure for
administering, scoring, and interpreting results.

What information is sought from achievement and aptitude tests?


Achievement tests measure what learners have learned after instruction or after going through a specific curricular
program. Achievement tests provide information on what learners can do and have acquired after training and instruction.
Achievement is a measure of what a person has learned within or up to a given time (Yaremko et. al 1982). It is a measure of the
accomplished skills and indicates what a person can do at present (Atkinson 1995). Kimball (1989) explained the traditional and
alternative views on the achievement of learners. He noted that the greater number of courses taken by learners and their more
extensive classroom experience with a subject may give them an advantage. Achievement can be measured by a variety of means.
Achievement can be reflected in the final grades of learners within a quarter. A quarterly test composed of several learning targets is
also a good way of determining the achievement of learners. It can also be measured using achievement test, such as the Wide
Range Achievement Test, California Achievement Test, and Iowa Test for Basic Skills.
According to Lohgman (2005), aptitudes are the characteristics that influence a person’s behavior that aid goal attainment
in a particular situation. Specifically, aptitude refers to the degree of readiness to learn and perform well in a particular situation or
domain (Corno et. al 2002). Examples include the ability to comprehend instructions, manage one’s time, use previously acquired
knowledge appropriately, make good inferences and generalization, and manage one’s emotions. Other developments have also led
to the conclusion that assessment of aptitude can go beyond cognitive abilities. An example is the Cognitive Abilities Measurement
that measures working memory capacity, ability to store old information and process new ones, and speed of an individual in
retrieving and processing new information (Kyllonen and Cristal 1989). Magno (2009) also created a taxonomy of aptitude test items.
The taxonomy provides items writers with a guide on the type of items to be included when building an aptitude test depending on
the skills specified. The taxonomy includes 12 classifications categorized as verbal and nonverbal. The schemes in the verbal category
include verbal analogy, syllogism, and number of letter series; the nonverbal is composed of topology, visual discrimination,
progressive series, visualization, orientation, figure ground perception, surface development, object assembly, and picture
completion.

How do we differentiate speed from power test?


Speed tests consist of easy items that need to be completed within a time limit. Power tests consist of items with increasing
level of difficulty, but time is sufficient to complete the whole test. An example of a power test was the one developed by the
National Council of Teachers of Mathematics that determines the ability of the examinees to utilize data to reason and become
creative, formulate, solve, and reflect critically on the problems provided. An example of a speed test is a typing test in which
examinees are required to correctly type as many words as possible given a limited amount of time.

How do we differentiate norm-referenced from criterion-referenced test?


The are two types of test based on how the scores are interpreted: norm-referenced and criterion-referenced tests.
Criterion-referenced test has a given set of standards, and the scores are compared to the given criterion. For example, in a 50-item
test: 40-50 is very high, 30-39 is high, 20-29 is average, and 10-19 is low, and 0-9 is very low. One approach in criterion-referenced
interpretation is that the score is compared to a specific cutoff. An example is the grading in schools where the range of grades 96-
100 is highly proficient, 90-95 is proficient, 80-89 is nearly proficient, and below 80 is beginning.

The norm-referenced test interprets results using the distribution of scores of a sample group. The mean and standard
deviations are computed for the group. The standing of every individual is a norm-referenced test is based on how far they are from
the mean and standard deviation of the sample. Standardized tests usually interpret scores using a norm set from a large sample.

Having an established norm for a test means obtaining the normal or average performance in the distribution of scores. A
normal distribution is obtains by increasing the sample size. A norm is a standard and is based on a very large group of samples.
Norms are reported in the manual of standardized tests.

A normal distribution found in the manual takes the shape of a bell curve. It shows the number of people within a range of
scores. It also reports the percentage of people with particular scores. The norms is used to convert a raw score into standard scores
for interpretability.
20.

Week 3

Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.

1. Create a graphic organizer for the different kinds of test. You may represent your ideas inside the circle and make connections
among ideas. Explain your graphic organizer to your classmates.

2. To know more about the different kinds of assessment, complete the table by providing other specific examples of each kind of
assessment. You may use other references.

Type Example
Educational
Psychological
Paper-and-Pencil
Performance-based
Teacher-made
Standardized
Achievement
Aptitude
Speed
Power
Norm-referenced
Criterion-referenced
21.

Quiz #3 - A
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.

A. Multiple-Choice

Read carefully each item. Choose the letter of the correct and best answer in every item.

1. What are two kinds of assessment based on form?


A. Teacher-made and Standardized
B. Educational and Psychological
C. Formative and Summative
D. Achievement and Aptitude

2. Which best describes a paper-and-pencil type of assessment?


A. It determines whether students have attained the learning target?
B. It provides a cognitive task that requires a single correct answer.
C. It is used to measure what students have learned after instruction.
D. It determines students’ cognitive and non-cognitive characteristics.

3. When are educational and psychological assessment used?


A. When tracking the growth of students and marking their performance
B. When designing objectives that match with the content of instruction
C. When giving feedback on how well students understand the lesson
D. When gathering information at any point of instruction

4. What is the difference between speed test and power test?


A. Speed test is the ability to type fast in a limited time, while power test contains items that vary in difficulty.
B. Speed test consists of a few pre-calculated difficult items, and the time is also limited;
power test consist of easy items, but time is limited.
C. Speed test consists of easy items, but time is limited; power test consists of a few pre-calculated
difficult items, and time is also limited.
D. Power test contains items that vary in difficult to the point that no subject is expected to get all items right
even with unlimited time, while speed test is the ability type fast in a limited time.

5. Can a teacher-made test become a standardized test?


A. Yes, as long as it is valid and reliable with standard procedure administering, scoring, and interpreting results.
B. Yes, because the test is not developed by the teacher to ascertain the student achievement and proficiency
in a given subject.
C. No, because it cannot determine the purpose and objectives of the test as to what to measure and why to measure.
D. No, because this test cannot be used as a tool for formative, diagnostic, and summative evaluation.
P22

Quiz #3 - b
B. Read each case and identify what kind of assessment is referred to.

Type of Assessment Situation


1. The science teacher at the end of the lesson on the phases of matter
provided a 20-item test to record how much learners have learned. What
kind of assessment
is used?
2. The teacher in music required learners to demonstrate how the ¾ best is
conducted. What kind of assessment is used?

3. A student got a score of 25 in a mathematics achievement test, which is


considered low compared with the scores found in the test manual. What
kind of assessment is used?

4. A teacher in English raised three questions to determine if learners


understood the story read. The learners who could not answer the questions
will be further helped on the skill needed. What kind of assessment is used?

5. A teacher made a 10-item spelling test where the word is pronounced and
the learners will write the correct spelling. What kind of assessment is used?

6. A teacher in science tested learners’ familiarity with the parts of the heart.
An illustration of a heart is provided and they need to label the parts. What
is the function of assessment?

7. A teacher used the Jackson’s Vocational Interest to determine which track


in senior high school the learners can select. What kind of assessment is
used?

8. A guidance counselor, as part of the career orientation f Grade 12


learners, administered a test evaluation their abstract reasoning, logic, and
identification of missing parts. What kind test is used?

9. The learners who got perfect scores in the science achievement test were
invited to join the science club. In this way, how was the score used?

10. The teacher in mathematics wanted to determine how well the learners
have learned about the mathematics curriculum at the end of the school
year. The Iowa test basic skills on math was administered. What kind of
assessment was administered?
23.

Chapter
Development and administration
Of test
Lesson 4: Planning a Written Test

To be able to learn or enhance your skills in planning for a good classroom test, you need to review your
knowledge on lesson plan development, constructive alignment, and different test formats. It is suggested that you read books and
other references in print or online that could help you design a good written test.

In this lesson, you are expected to:


 Set appropriate instructional objectives for a written test and
 Prepare a table of specifications for a written test
24.
Why do you need to define the test objectives or learning outcomes targeted for assessment?
In designing a well-planned written test, first and foremost, you should be able to identify the intended the learning
outcomes in a course, where a written test is an appropriate method to use. These learning outcomes are knowledge, skills,
attitudes, and values that every student should throughout the course. Clear articulation of learning outcomes is a primary
consideration in lesson planning because it serves as the basis for evaluating the effectiveness of the teaching and learning process
determined through testing and assessment. Learning objectives or outcomes are measurable statements that articulate, at the
beginning of the course, what students should know and be able to do or value as a result of taking the course. These learning goals
provide the rationale for the curriculum and instruction. They provide teachers the focus and direction on how the course is to be
handled, particularly in terms of course content, instruction, and assessment. On the other hand, they provide the students with the
reasons and motivation to study and persevere. They give students the opportunity to be aware of what they need to do to be
successful in the course, take control and ownership of their progress, and focus on what they should be learning. Setting objectives
for assessment is the process of establishing direction to guide both teacher in teaching and the student in learning.

What are the objectives for testing?


In developing a written test, the cognitive behaviors of learning outcomes are usually targeted. For the cognitive domain, it
is important to identify the levels of behaviors expected from the students. Traditionally, Bloom’s Taxonomy was used to classify
learning objectives based on levels of complexity and specificity of the cognitive behaviors. With knowledge at the base (i.e., lower-
order thinking skills), the categories progress to comprehension, application, analysis, synthesis, and evaluation. However, Anderson
and Krathwohl, Bloom’s student and research partner, respectively, came up with a revised taxonomy, in which the nouns used to
represent the levels of cognitive behavior were replaced by verbs, and the synthesis and evaluation were switched.

Figure 4.1 Taxonomies of Instructional Objectives


In developing the Cognitive domain of instructional objectives, key verbs can be used. See Lesson 2 for the sample
objectives in the RBT Framework.

Evaluation Create

Synthesis Evaluate

Analysis Analyze

Application Apply

Comprehension Comprehension

Knowledge Remember

What is a table of specifications?


A table of specifications (TOS), sometimes called a test blueprint, is a tool used by teachers to design a test. It is a table that
maps out the test objectives, contents, or topics covered by the test; the levels of cognitive behavior to be measured; the
distribution of items, number, placement, and weights of test items; and the test format. It helps ensure that the course’s intended
learning outcomes, assessments and instructions are aligned.
Generally, the TOS is prepared before a test is created. However, it is ideas to prepare one even before the start of
instruction. Teachers need to create a TOS for every test that they intend to develop.
25.
The test TOS is important because it does the following:
- Ensures that the instructional objectives and what the test captures match
- Ensures that the test developer will not overlook details that are considered essential to a good test
- Makes developing a test easier and more efficient
- Ensures that the test will sample all important content area and processes
- Is useful in planning and organizing
- Offers an opportunity for teachers and students to clarify achievement expectations

What are the general steps in developing a table of specifications?


Learner assessment within the framework of classroom instruction requires planning.
The following are the steps in developing a table of specifications:
1. Determine the objectives of the test. The first step is to identify the test objectives. This should be based on the
instructional objectives. In general, the instructional objectives or the intended learning outcomes are identified at the
start, when the teacher creates the course syllabus. There are three types of objectives: (1) cognitive, (2) affective, (3)
psychomotor. Cognitive objectives are designed to increase an individual’s knowledge, understanding, and awareness. On
the other hand, affective objectives aim to change an individual’s attribute into something desirable, while psychomotor
objectives are designed to build physical or motor skills. When planning for assessment, choose only the objectives that can
be best captured by a written test. There are objectives that are not meant for a written test. For example, if you test
psychomotor domain, it is better to do a performance-based assessment. There are also cognitive objectives that are
sometimes better assessed through performance-based assessment. Those that require the demonstration or creation of
something tangible like projects would also be more appropriately measured by performance-based assessment. For a
written test, you can consider cognitive objectives, ranging from remembering to creating of ideas that could be measured
using common formats for testing, such as multiple-choice, alternative response test, matching type, and even essays or
open-ended tests.
2. Determine the coverage of the test. The next step in creating the TOS is to determine the contents of the test. Only topics
or contents that have been discussed in class and are relevant should be included in the test.

3. Calculate the weight for each topic. Once the test coverage is determined, the weight of each topic covered in the test is
determined. The weight assigned per topic in the test is based on the relevance and the time spent to cover each topic
during instruction. The percentage of time for a topic in a test is determined by dividing the time spent for that topic during
instruction by the total amount of time spent for all topics covered in the test. For example, for a test on the Theories of
Personality for General Psychology 101 class, the teacher spent ¼ to 1½ hour class sessions. As such, the weight for each
topic is as follows:

Topic No. of Sessions Time Spent Percent Time (Weight)


Theories and Concepts 0.5 class session 30 min 10.0
Psychoanalytic Theories 1.5 class session 90 min 30.0
Trait Theories 1 class session 60 min 20.0
Humanistic Theories 0.5 class session 30 min 10.0
Cognitive Theories 0.5 class session 30min 10.0
Behavioral Theories 0.5 class session 30min 10.0
Social Learning Theories 0.5 class session 30min 10.0
TOTAL 5 class sessions 300 min or 5 hours 100

4. Determine the number of items for the whole test. To determine the number of items to be included in the test, the
amount of time needed to answer the items are considered. As a general rule, students are 30-60 seconds for each item in
test formats which choices. For a one-hour class, this means that the test should not exceed 60 items. However, because
you need also to give them for test paper/booklet distribution and giving instruction, the number of items should be less,
maybe just 50 items.
5. Determine the number of items per topic. To determine the number of items to be included in the test, the weights per
topic are considered. Thus, using the examples above, for a 60-item final test, Theories & Concepts, Humanistic Theories,
Cognitive Theories, Behavioral Theories, and Social Learning Theories will have a 5 items, Trait Theories – 10 items, and
Psychoanalytic Theories – 15 items.
26.

Topic Percent of Time (Weight) No. of Items


Theories and Concepts 10.0 5
Psychoanalytic Theories 30.0 15
Trait Theories 20.0 10
Humanistic Theories 10.0 5
Cognitive Theories 10.0 5
Behavioral Theories 10.0 5
Social Learning Theories 10.0 5
TOTAL 100 50 Items

What are the different formats of a test table of specifications?


There are three (3) types of TOS: (1) one-way, (2) two-way, and (3) three-way.

1. One-Way TOS. A one-way TOS maps out the content or topic, test objectives, number of hours spent, and format, number,
and placement of items. This type of TOS is easy to develop and use because it just works around the objectives without
considering the different levels of cognitive behaviors. However, a one-way TOS cannot ensure that all levels of cognitive
behaviors that should have been developed by the course are covered in the test.
2. Two-Way TOS. A two-way TOS reflects not only the content, time spent, and number of items but also the levels of
cognitive behavior targeted per test content based on the theory behind cognitive testing. For example, the common
framework for testing at present in the DepEd Classroom Assessment Policy is the Revised Bloom’s Taxonomy (DepEd,
2015). One advantage of this format is that it allows one to see the levels of cognitive skills and dimensions of knowledge
that are emphasized by the test. It also shows the framework of assessment used in the development of the test. However,
this format is more complex than the one-way format
3. Three-Way TOS. This type of TOS reflects the features of one-way and two-way TOS. One advantage of this format is that it
challenges the test writers to classify objectives based on the theory behind the assessment. It also shows the variability of
thinking skills targeted by the test. However, it takes a much longer to develop this type of TOS.

COMPUTATION
1. % of item = number of hours allotted
total number of hours
=6
24
= 0.25 OR 25%

2. Number of items = % of items x total number of items


= 0.25 x 40
= 10
________________________________________________________________________
6 COGNITIVE PROGRESSIONS
Number of items
1. Remembering 9
2. Understanding LITERAL 8
3. Applying 6
4. Analyzing CRITICAL 7
5. Evaluating – CREATIVE 6
6. Creating – APPLIED 4
Total 40 items
27.
SAMPLE OF TABLE OF SPECIFICATION (TOS)

MARIAN COLLEGE
IPIL ZAMBOANGA SIBUGAY
MIDTERM
TABLE OF SPECIFICATIONS
Subject: ASSESSMENT OF STUDENT LEARNING-1
Number of Items: 40 School Year: 2021 – 2022

No. of No. of Cognitive Levels


Learning Competencies % Hours Total
Hours Items Remembering Understanding Applying Analyzing Evaluating Creating
1. Explain and understand the
.25
purpose of assessment, 6 10 1, 6, 9, 10 2, 7 3 4, 8 5 10
25%
measurement and evaluation

2. Interpret each function of .25


16, 17, 18,
assessment, measurement and 6 10 11, 20 12, 15 13, 14
19 10
evaluation in the instructional 25%
process.

3. Demonstrate understanding in .125


5
identifying the guiding principle 3 5 24 25 23 21
12.5% 22
of three types of learning.

.25
4. Exhibit understanding on 10 26, 31 32 29, 30 33, 34, 35 27, 28 10
the process of applying 6 25%
variety tools of assessment.

5. Demonstrate the .125


appropriateness of using 3 5 39 36 40 37, 38 5
varied formats of assessment 12.5%
in the instruction.

24 100% 40 9 6 5 6 10 4 40
TOTAL
Prepared By:
P28

Week 4

Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________

Instructions: a.) Using other bond papers/papers strictly prohibited.


b.) Answer the following questions briefly.

 DEVELOP

1. When planning for a test, what should you do first?

Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________

2. Are all objectives of instruction measured by a paper-and-pencil test?

Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________

3. When constructing a TOS where objectives are set without classify them according to their cognitive behavior, what
format do you use?

Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________

4. If you designed a two-way TOS for your test, what does this format have?

Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
29.

Quiz #4
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________

Instructions: a.) Using other bond papers/papers strictly prohibited.


b.) Choose the letter of the correct answer to every item given.

1. The instructional objective focuses on the development of learners’ knowledge. Can this objective be assessed
using the multiple-choice format?
A. No, this objective requires an essay format.
B. No, this objective is better assessed using matching type of test.
C. Yes, as multiple-choice is appropriate in assessing knowledge.
D. Yes, as multiple-choice is the most valid format when assessing format.

2. You prepared an objective test format for your quarterly test in mathematics. Which of the following could NOT
have been your test objective?
A. Interpret a line graph
B. Construct a line graph
C. Compare the information presented in a line graph
D. Draw conclusions from the data presented in a line graph.

3. Teacher Myrna prepared a TOS as her guide in developing a test. Why is this necessary?
A. To guide the planning of instruction
B. To satisfy the requirements in developing a test
C. To have a test blueprint as accreditation usually requires this plan
D. To ensure that the test is designed to cover what it intends to measure

4. Ms. Zamora prepared a TOS that shows both the objectives and the different levels of cognitive behavior. What
format could she have used?
A. One-way format
B. Two-way format
C. Three-way format
D. Four-way format

5. The school principal wants the teachers to develop a TOS that uses the two-way format than a one-way format.
Why do you think this is the principal’s preferred format?
A. So that the different levels of cognitive behaviors to be tested are known
B. So that the formats of the test are known by just looking at the TOS
C. So that the test writer would know the distribution of the test items
D. So that objectives for instruction are also reflected in the TOS
P30.

Lesson 5: Construction of Written Tests

Classroom assessments are an integral part of learners’ learning. They do more than just measure learning. They also inform the
learners what needs to be learned and to what extent and how to learn them. They also provide the parents some feedback about
their child’s achievement of the desired learning outcomes. The schools also get to benefit from classroom assessments because
learners’ test results can provide them evidence-based data that are useful for instructional planning and decision-making. As such,
it is important that assessment tasks or tests are meaningful and further promote deep learning, as well as fulfill the criteria and
principles of test construction.
There are many ways by which learners can demonstrate their knowledge and skills and show evidence of their proficiencies at
the end of a lesson, unit, or subject. While authentic/performance-based assessments have been advocated as the better and more
appropriate methods in assessing learning outcomes, particularly as they assess higher-level thinking skills, traditional written
assessment methods, such as multiple-choice tests, are also considered as appropriate and efficient classroom assessment tools for
some types of learning targets. This is especially true for large classes and when test results are needed immediately for some
educational decisions. Traditional tests are also deemed reliable and exhibit excellent content and construct validity.
To learn or enhance your skills in developing good and effective test items for a particular test format, you need to go back and
review your prior knowledge on different test formats; how and when to choose a particular test format that is the most appropriate
measure of the identified learning objectives and desired learning outcomes of your subject; and how to construct good and
effective items for each format.

In this lesson, you are expected to:


 Identify the appropriate test format to measure learning outcomes, and
 Apply the general guidelines in constructing test items for different test formats.
31.
What are the general guidelines in choosing the appropriate test format?
Not every test is universally valid for every type of learning outcome. For example, if an intended outcome for a Research
Method 1 course is “to design and procedure a research study relevant to one’s field of study,” you cannot measure this outcome
through a multiple-choice test or a matching-type test.
To guide you on choosing the appropriate test format and designing fair and appropriate yet challenging tests, you should
ask the following important questions:
1. What are the objectives or desired learning outcomes of the subject/unit/lesson being assessed?
Deciding on what test format to use generally depends on your learning objectives or the desired learning outcomes of the
subject/unit/lesson. Desired learning outcomes (DLOs) are statements of what learners are expected to do or demonstrate
as a result of engaging in the learning process. It is suggested that you return to Lesson 4 to review on how to set or write
instructional objectives or intended learning outcomes for a subject.
2. What level of thinking is to be assessed (i.e., remembering, understanding, applying, analyzing, evaluating, and creating)?
Does the cognitive level of the test question match your instructional objectives or DLOs?
The level of thinking to be assessed is also an important factor to consider when designing your test, as this will guide you in
choosing the appropriate test format. For example, if you intend to assess how much your learners are able to identify
important concepts discussed in class (i.e., remembering or understanding level), a selected-response format such as
multiple-choice test would be appropriate. However, if you intend to assess how your students will be able to explain and
apply in another setting a concept or framework learned in class (i.e., applying and/or analyzing level), you may consider
giving constructed-response test formats such as essay.
It is important that when constructing classroom assessment tools, all levels of cognitive behaviors are represented—from
Remembering(R), Understanding (U), Applying (Ap), Analyzing (An), Evaluating (E), and Creating (C)—and taking into
consideration the Knowledge Dimensions, i.e., Factual (F), Conceptual (C), Procedural (P), and Metacognition (M). You may
return to Lesson 2 and Lesson 4 to review the different levels of Cognitive Behavior and Knowledge Dimensions.

3. Is the test matched or aligned with the course’s DLOs and the course contents or learning activities?
The assessment tasks should be aligned with the instructional activities and the DLOs. Thus, it is important that you are
clear about what DLOs are to be addressed by your test and what course activities or tasks are to be implemented to
achieve the DLOs.
For example, if you want learners to articulate and justify their stand on ethical decision-making and social responsibility
practices in business (i.e., DLO), then an essay test class debate are appropriate measures and tasks for this learning
outcome. A multiple-choice test may be used but only if you intend to assess learners’ ability to recognize what is ethical
versus unethical decision-making practice. In the same manner, matching-type items may be appropriate if you want to
know whether your students can differentiate and match the different approaches or terms to their definitions.
4. Are the test items realistic to the students?
Test items should be meaningful and realistic to the learners. They should be relevant or related to their everyday
experiences. The use of concepts, terms, or situations that have not been discussed in the class or that they have never
encountered, read, or heard about should be minimized or avoided. This is to prevent learners from making wild guesses,
which will undermine your measurement of what they have really learned from the class.

What are the major categories and formats of traditional tests?


For the purposes of classroom assessment, traditional tests fall into two general categories: (1) selected-response type, in
which learners select the correct answer response from the given options, and (2) constructed-response type, in which the learners
are asked to formulate their own answers. The cognitive capabilities required to answer selected-response items are different from
those required by constructed-response items, regardless of content.
Selected-Response Tests require learners to choose the correct answer or best alternative from several choices. While they can
cover a wide range of learning materials very efficiently and measure a variety of learning outcomes, they are limited when assessing
learning outcomes that involve more complex and higher-level thinking skills. Selected-response test include:
 Multiple Choice Test. It is the most commonly used format in formal testing and typically consists of a stem (problem), one
correct or best alternatives (correct answer), and three or more incorrect or inferior alternatives (distractors).
 True-False or Alternative Response Test. It generally consists of a statement and deciding if the statement is true
(accurate/correct) or false (inaccurate/incorrect).
 Matching-Type Test. It consists of two sets of items to be matched with each other based on a specified attribute.

Constructed-Response Test require learners to supply answers to a given question or problem. These include:
 Short Answer Test. It consists of open-ended questions or incomplete sentences that require learners to create an answer
32.
For each item, which is typically a single word or short phrase. This includes the following types:
o Completion. It consists of incomplete statements that require the learners to fill in the blanks with the correct
word or phrase.
o Identification. It consists of statements that require the learners to identify or recall the terms/concepts, people,
places, or events that are being described.
o Enumeration. It requires the learners to list down all possible answers to the question.
 Essay Test. It consists of problems/questions that require learners to compose or construct written responses, usually long
ones with several paragraphs.
 Problem-Solving Test. It consists of problems/questions that require learners to solve problems in quantitative or non-
quantitative setting using knowledge and skills in mathematical concepts and procedures, and/or other higher-order
cognitive skills (e.g., reasoning, analysis, critical thinking, and skills).

What are the general guidelines in writing multiple-choice test items?


Writing multiple-choice items requires content mastery, writing skills, and time. Only good and effective items should be
included in the test. Poorly-written test items could be confusing and frustrating to learners and yield test scores that are not
appropriate to evaluate their learning and achievement. The following are the general guidelines in writing good multiple-choice
items. They are classified in terms of content, stem, and options.
Content:
1. Write items that reflect only one specific content and cognitive processing skills.

Faulty: Which of the following is a type of statistical procedure used to test a hypothesis regarding significant relationship between
variables, particularly in terms of the extent and direction of association?
A. ANCOVA C. Correlation
B. ANOVA D. t-test

Good: Which of the following is an inferential statistical procedure used to test a hypothesis regarding significant differences
between two qualitative variables?
A. ANCOVA C. Chi-Square
B. ANOVA D. Mann-Whitey Test
2. Do not lift and use statements from the textbook or other learning materials as test questions.
3. Keep the vocabulary simple and understandable based on the level of learners/examinees.
4. Edit and proofread the items for grammatical and spelling before administering them to the learners.

Stem:
1. Write the directions in the stem in a clear and understandable manner.

Faulty: Read each question and indicate your answer by shading the circle corresponding to your answer.

Good: This test consists of two parts. Part A is a reading comprehension test, and Part B is a grammar/language test. Each
question is a multiple-choice test item with (5) options. You are to answer each question but will not be penalized for a
wrong answer or for guessing. You can go back and review your answers during the time allotted.

2. Write stems that are consistent in form and structure, that is, present all items either in question form or in descriptive or
declarative form.

Faulty: (1) Who was the Philippine president during Martial Law?
(2) The first president of the Commonwealth of the Philippines was .

Good: (1) Who was the Philippine president during Martial Law?
(2) Who was the first president of the Commonwealth of the Philippines?

3. Word the stem positively and avoided double negatives, such as NOT and EXCEPT in a stem. If a negative word is necessary,
underline or capitalize the words for emphasis.

Faulty: Which of the following is not a measure of variability?


Good: Which of the following is NOT a measure of variability?
33.
4. Refrain from making stem too wordy or containing too much information unless the problem/question requires the facts
presented to solve the problem.

Faulty: What does DNA stand for, and what is the organic chemical of complex molecular structure found in all cells and
viruses and codes genetic information for the transmission of inherited traits?

Good: As a chemical compound, what does stand DNA stand for?

Options:
1. Provide three (3) to five (5) options per item, with only one being the correct or best answer/alternative.
2. Write options that are parallel or similar in form and length to avoid giving clues about the correct answer.

Faulty: What is an ecosystem?


A. It is a community of living organisms in conjunction with the nonliving components of their environment that
interact as a system. These biotic and abiotic components are linked together through nutrient cycles and energy
flows.
B. It is a place on Earth’s surface where life dwells.
C. It is the biotic and abiotic surroundings of an organization or population.
D. It is the largest division of the Earth’s surface filled with living organisms.
Good: What is an ecosystem?
A. It is a place on Earth’s surface where life dwells.
B. It is the biotic and abiotic surroundings of an organization or population.
C. It is the largest division of the Earth’s surface filled with living organisms.
D. It is a large community of living and non-living organisms in a particular area.
E. It is an area that one or more individual organisms defend against competition from other organisms.
3. Place options in a logical order (e.g., alphabetical, from shortest to longest).

Faulty: Which experimental gas law describes how the pressure of a gas tends to increase as the volume of the container
decreases? (i.e, “The absolute pressure exerted by a given mass of an ideal gas is inversely proportional to the volume it
occupies.”)
A. Boyle’s Law D. Avogadro’s Law
B. Charles Law E. Faraday’s Law
C. Beer Lambert Law
Good: Which experimental gas law that describes how the pressure of gas tends to increase as the volume of the container
decreases? (i.e, “The absolute pressure exerted by a given mass of an ideal gas is inversely proportional to the volume it
occupies.”
A. Avogadro’s Law D. Charles Law
B. Beer Lambert Law E. Faraday’s Law
C. Boyle’s Law
4. Place correct response randomly to avoid a discernable pattern of correct answer.
5. Use None-of-the-above carefully and only when there is one absolutely correct answer, such as in spelling or math items.

Faulty: Which of the following is a nonparametric statistics?


A. ANCOVA D. t-test
B. ANOVA E. None of the Above
C. Correlation
Good: Which of the following is a nonparametric statistics?
A. ANCOVA D. Mann-Whitey Test
B. ANOVA E. t-test
C. Correlation
34.
6. Avoid All of the Above as an option, especially if it is intended to be the correct answer.

Faulty: Who among the following has become the President of Philippine Senate?
A. Ferdinand Marcos D. Quintin Paredes
B. Manuel Quezon E. All of the Above
C. Manuel Roxas
Good: Who was the first ever President of the Philippine Senate?
A. Eulogio Rodriguez D. Manuel Roxas
B. Ferdinand Marcos E. Quintin Paredes
C. Manuel Roxas
7. Make all options realistic and reasonable.

What are the general guidelines in writing matching-type items?


The matching test item format requires learners to match a word, sentence, or phrase in one column (i.e., premise) to a
corresponding word, sentence, or phrase in a second column (i.e., response). It is most appropriate when you need to measure the
learners’ ability to identify the relationship or association between similar items. They work best when the course content has many
parallel concepts. While matching-type format is generally used for simple recall of information, you can find ways to make it
applicable or useful in assessing higher level of thinking such as applying and analyzing.
The following are the general guidelines in writing good and effective matching-type test:
1. Clear state in the directions the basis for matching the stimuli with the responses.

Faulty: Directions: Match the following.

Good: Directions: Column I is a list of countries while Column II presents the continent where these countries are located.
Write the letter of the continent corresponding to the country on the line provided in Column I.
Item #1’s instruction is less preferred as it does not detail the basis for matching the stem and the response options.
2. Ensure that the stimuli are longer and the responses are shorter.

Faulty: Match the description of the flag to its country.


A B
Bangladesh A. Green background with red circle in the center
Indonesia B. One red strip on top and white strip at the bottom
Japan C. Red background with white five-petal flower in the center
Singapore D. Red background with yellow circle in the center
Thailand E. Red background with white five-petal flower in the center
F. White background with large circle in the center

Good: Match the description of the flag to its country.


A B
Green background with red circle in the center A. Bangladesh
One red strip on top and white strip at the bottom B. Hong Kong
Red background with white five-petal flower in the center C. Indonesia
Red background with yellow center in the center D. Japan
Red background with white five-petal flower in the center E. Singapore
White background with large circle in the center F. Vietnam
Item #2 is a better version because the descriptions presented in the first column while the response options are in the
second column. The stems are also longer than the options.

3. For each item, include only topics that are related with one another and share the same foundation of information.
Faulty: Match the following:
A B
1. Indonesia A. Asia
2. Malaysia B. Bangkok
3. Philippines C. Jakarta
4. Thailand D. Kuala Lumpur
5. Year ASEAN was establish E. Manila
F. 1967
35.
Good: On the line to the left of each country in Column I, write the letter of the country’s capital presented in Column II.
Column I Column II
1. Indonesia A. Bandar Seri Begawan
2. Malaysia B. Bangkok
3. Philippines C. Jakarta
4. Thailand D. Kuala Lumpur
E. Manila
Item #1 is considered an unacceptable item because its response options are not parallel and include different kinds of
information that can provide clues to the correct/wrong answer. On the other hand, Item #2 details the basis for matching
and the response options only include related concepts.
4. Make the response options short, homogeneous, and arranged in logical order.

Faulty: Match the chemical elements with their characteristics.


A B
Gold A. Au
Hydrogen B. Magnetic metal used in steal
Iron C. Hg
Potassium D. K
Sodium E. With lowest density
F. Na
Good: Match the chemical elements with their symbols.
A B
Gold A. Au
Hydrogen B. Fe
Iron C. H
Potassium D. Hg
Sodium E. K
F. Na
In Item #1, response options are not parallel in content and length. They are not also arranged alphabetically.
5. Include response options that are reasonable and realistic and similar in length and grammatical form.
Faulty: Match the subjects with their course description.
A B
History A. Studies the production and distribution of goods/services
Political Science B. Study of politics and power
Psychology C. Study of society
Sociology D. Understands role of mental function in social behaviors
E. Uses narratives to examine and analyze past events
Good: Match the subjects with their course description.
A B
1. Study of living things A. Biology
2. Study of mind and behavior B. History
3. Study of politics and power C. Political Science
4. Study of recorded events in the past D. Psychology
5. Study of society E. Sociology
F. Zoology
Item #1 is less preferred because the response options are not consistent in terms of their length and grammatical form.

6. Provide more response options than the number of stimuli.

Faulty: Match the following fractions with their corresponding decimals equivalents:
A B
¼ A. 0.25
5/4 B. 0.28
7/25 C. 0.90
9/10 D. 1.25
36.
Good: Match the following fractions with their corresponding decimals equivalents:
A B
¼ A. 0.09
5/4 B. 0.25
7/25 C. 0.28
9/10 D. 0.90
E. 1.25
Item #1 is considered inferior to Item #2 because it includes the same number of response options as that of the stimuli, thus
making it more prone to guessing.
What are the general guidelines in writing true or false items?
True or false items are used to measure learners’ ability to identify whether a statement or proposition is correct/true or
incorrect/false. They are best used when learners’ ability to judge or evaluate is one of the desired learning outcomes of the course.
There are different variations of the true or false items. These include the following:

1. T-F Correction or Modified True-or-False Question. In this format, the statement is presented with a key word or phrase
that is underlined, and the learner has to supply the correct word or phrase.
e.g., Multiple-Choice Test is authentic.
2. Yes-No Variation. In this format, the learner has to choose yes or no, rather than true or false.
e.g., The following are kinds of test. Circle Yes if it is authentic test and No if not.
Multiple Choice Test Yes No
Debates Yes No
End-of-the Term Project Yes No
True or False Test Yes No
3. A-B Variation. In this format, the learner has to choose A or B, rather than true or false.
e.g., Indicate which of the following are traditional or authentic tests by circling A if it is a traditional test and B if it is
authentic.
Traditional Authentic
Multiple Choice Test A B
Debates A B
End-of-the Term Project A B
True or False Test A B
Because true or true test items are prone to guessing, as learners are asked to choose between two options, utmost care
should be exercised in writing true or false items. The following are the general guidelines in writing true or false items:
1. Include statements that are completely true or completely false.

Faulty: The presidential system of government, where the president is only the head of state or government, is adopted by
the United States, Chile, Panama, and South Korea.

Good: The presidential system, where the president is the only the head of state or government, is adopted by Chile.
Item #1 is of poor quality because, while the description is right, the countries given are not all correct. While South
Korea has a presidential system of government, it also has a prime minister who governs alongside with the
president.
2. Use and easy-to-understand statements.

Faulty: Education is a continuous process of higher adjustment for human beings who have evolved physically and mentally,
which is free and conscious of God, as manifested in nature around the intellectual, emotional, and humanity of man.
Good: Education is the process of facilitating learning or the acquisition of knowledge, skills, values, beliefs, and habits.
Items #1 is somewhat confusing, especially for younger learners because there are many ideas in one statements.
3. Refrain from using negatives—especially double negatives.

Faulty: There is nothing illegal about buying goods through the internet.

Good: it is legal to buy things or goods through the internet.


Double negatives are sometimes confusing and could result in wrong answer, not because the learner does not
know the answer but because of how the test items are presented.
P37.
4. Avoid using absolutes such as “always” and “never”.

Faulty: The news and information posted on the CNN website is always accurate.

Good: The news and information posted on the CNN website is usually accurate.
Absolute words such as “always” and “never” restrict possibilities and make a statement as true 100 percent or all
the item. They are also a hint for a “false” answer.
5. Express a single idea test item.

Faulty: If an object is accelerating, a net force must be acting on it, and the acceleration of an object is directly proportional
to the net force applied to the object.
Good: if an object is accelerating, a net force must be acting on it.
Item #1 consists of two conflicting ideas, wherein one is not correct.
6. Avoid the use of unfamiliar words or vocabulary.

Faulty: Esprit de corps among soldiers is important in the face of hardships and opposition in fighting the terrorists.

Good: Military morale is important in the face of hardships and opposition in fighting the terrorists.
Students may have a difficult time understanding the statement, especially if the word “esprit de corps” has not
been discussed in the class. Using unfamiliar words would likely lead to guessing.
7. Avoid lifting statements from the textbook and other learning materials.

What are the general guidelines in writing short-answer test items?


A short-answer test item requires the learner to answer a question or to finish an incomplete statement by filling in the
blank with the correct word or phrase. While it is most important appropriate when you only intend to assess learners’ lower-level
thinking, such as their ability to recall facts learned in class, you can create items that minimize guessing and relevant clues to the
correct answer.
The following are the general guidelines in writing good fill-in-the-blank or completion test items:
1. Omit only significant words from the statement.

Faulty: Every atom has a central called a nucleus.

Good: Every atom has a central core called a(n) .


In Item #1, the word “core” is not the significant word. The item is also prone to many and varied interpretations,
resulting to many possible answers.
2. Do not omit too many words from the statement such that the intended meaning is lost.

Faulty: is to Spain as the is to United States and as is to Germany.

Good: Madrid is to Spain as the is to France.


Item #1 is prone to many and varied answers. For example, a student may answer the question based on the capital
of these countries or based on what continent they are located. Item #2 is preferred because it is more specific and
requires only one correct answer.
3. Avoid obvious clues to the correct response.

Faulty: Ferdinand Marcos declared martial law in 1972. Who was the president during that period?

Good: The president during the martial law years was ?


Item #1 already gives a clue that Ferdinand Marcos was the president during this time because only the president
of a country can declare martial law.
4. Be sure that there is only one correct response.
Faulty: The government should start using renewable energy sources for generating electricity, such as .

Good: The government should start renewable sources of energy by using turbines called .
Item #1 has many possible answers because the statement is very general (e.g., wind, solar, biomass, geothermal,
and hydroelectric). Item #2 is more specific and only requires one correct answer (i.e., wind).
P38.
5. Avoid grammatical clues to the correct response.

Faulty: A subatomic particle with a negative electric charge is called an .

Good: A subatomic particle with a negative electric charge is called a(n) .


The word “an” in Item #1 provides a clue that the correct answer starts with a vowel.
6. If possible, put the blank at the end of a statement rather than at the beginning.

Faulty: is the basic building block of matter.

Good: The basic building block of matter is .


In Item #1, learners may need to read the sentence until the end before they can recognize the problem, and then
re-read it again and then answer the question. On the other hand, in Item #2, learners can already identify the
context of the problem by reading through the sentence only once and without having to go back and re-read the
sentence.

What are the general guidelines in writing essay tests?


Teachers generally choose and employ essay tests over other forms of assessment because essay tests require learners to
create a response rather than to simply select a response from among alternatives. They are the preferred form of assessment when
teachers want to measure learners’ higher-order thinking skills, particularly their ability to reason, analyze, synthesize, and evaluate.
They also assess learners’ writing abilities. They are most appropriate for assessing learners’ (1) understanding of subject-matter
content, (2) ability to reason with their knowledge of the subject, and (3) problem-solving and decision skills because items or
situations presented in the test are authentic or close to real life experiences.
There are two types of essay test: (1) extended-response essay and (2) restricted-response essay.

Extended-Response Restricted-Response

Requires much longer and complex


Is much more focused and restrained
response

How are the leopard and tiger differ? Support Tina is preparing for a demonstration to display at
your answer with details and information from her school’s science fair. She needs to show the
the article. effects of salt on the buoyancy of egg.

Part: Identify at least two other actions that would


make Tina’s demonstration better

Part B: Explain why each action would improve


demonstration.

The following are the general guidelines in constructing good essay questions:
1. Clearly define the intended learning outcomes to be assessed by the essay test.
To design effective essay questions or prompts, the specific intended learning outcomes are identified. If the intended
learning outcomes to be assessed lack clarity and specificity, the questions or prompts may assess something other than
what they intend to assess. Appropriate direct verbs that most closely match the ability that learners should demonstrate
must be used in the prompts. These include verbs such as compose, analyze, interpret, explain, and justify, among others.
2. Refrain from using essay test for intended learning outcomes that are better assessed by the other kinds of assessment.
Some intended learning outcomes can be efficiently and reliably assessed by selected-type test rather than by essay test. In
the same manner, there are intended learning outcomes that are better assessed using other authentic assessments, such
as performance test, rather than by essay tests. Thus, it is important to take into consideration the limitations of essay tests
when planning and deciding what assessment method to employ for an intended learning outcome.
P39.
3. Clearly define and situate the task within a problem situation as well as the type of thinking required to answer the test.
Essay questions or prompts should provide clear and well-defined tasks to the learners. It is important to carefully choose
the directive verb, to write clearly the object or focus of the directive verb, and to delimit the scope of the task. Having clear
and well-defined tasks will guide learners on what to focus on when answering the prompts, thus avoiding responses that
contain ideas that are unrelated or irrelevant, too long, or focusing only on some part of the task. Emphasizing the type of
thinking required to answer the question will also guide students on the extent to which they should be creative, deep,
complex, and analytical in addressing and responding to the questions.
4. Present tasks that are fair, reasonable, and realistic to the students.
Essay questions should contain tasks or questions that students will be able to do or address. These include those that are
within the level of instruction/training, expertise, and experience of the students.
5. Be specific in the prompts about the time allotment and criteria for grading the response.
Essay prompts and directions should indicate the approximate time given to the students to answer the essay questions to
guide them on how much time they should allocate for each item, especially if several essay questions are presented. How
the responses are to be graded or rated should be clarified to guide the students on what to include in their responses.

What are the general guidelines in problem-solving test items?


Problem-solving test items are used to measure learners’ ability to solve problems that require quantitative knowledge and
competencies and/or critical thinking skills. These items present a problem situation or task that will require learners to demonstrate
work procedures or come up with a correct solution. Full or partial credit can be assigned to the answers, depending on the answers
or solutions required.
There are different variations of the quantitative problem-solving items. These include the following:
1. One answer choice – This type of question contains four or five options, and students are required to choose the best
answer.
Example: What is the mean of the following score distribution: 32, 44, 56, 69, 75, 77, 95, 96?
A. 68 D. 74
B. 69 E. 76
C. 72
The correct answer is A (68).
2. All possible answer choice – This type of question has four or five options, and students are required to choose all of the
options that are correct.

Example: Consider the following score distribution: 12, 14, 14, 14, 17, 24, 27, 28, 30. Which of the following is/are
the correct measure/s of central tendency? Indicate all possible answers.
A. Mean = 20 D. Median = 17
B. Mean = 22 E. Mode = 14
C. Median = 16
Options A, D and E are all correct answers.
3. Type-In answer – This type of question does not provide options to choose form. Instead, the learners are asked to supply
the correct answer. The teacher should inform the learners at the start how their answers will be rated. For example, the
teacher may require just the correct answer or may require learners to present the step-by-step procedures in coming up
their answers. On the other hand, for non-mathematical problem solving, such as a case study, the teacher may present a
rubric how their answers will be rated.
Example: Compute the mean of the following score distribution: 32, 44, 56, 69, 75, 77, 95, 96. Indicate your
answer in the blank provided.
In this case, the learners will only need to give the correct answer without having to show the procedures for computation.
Example: Lillian, a 55-year old accountant, has been suffering from frequent dizziness, nausea, and
lightheadedness. During the interview Lillian was obviously restless, and sweating. She reported feeling so stressed
and fearful of anything without any apparent reason. She could not sleep and eat well. She also started to
withdraw from family and friends, as she experienced frequent panic attacks. She also said that she was constantly
worrying about everything in work and at home. What might be Lillian’s problem? What should she do to alleviate
all her symptoms?
Problem-solving test items are good test format as they minimize guessing, measure instructional objectives that focus on
higher cognitive levels, and measure extensive amount of contents or topics. However, they require more time for teachers
to construct, read, and correct, and are prone to rater bias, especially when scoring rubrics/criteria are not available. It is
therefore important that good quality problem-solving test items are constructed.
P40.
The following are some general guidelines in constructing good problem-solving test items:
1. Identify and explain the problem clear.

Faulty: Tricia was 135.6 lbs. when she started with her zumba/aerobics exercises. After three months of attending the
sessions three times a week, her weight was down to 122.8 lbs. About how many lbs. did she lose after three months?
Write your final answer in the space provided and show your computation. [This question asks “about how many” and does
not indicate whether the learners need to give the exact weight or whether they need to round off their answer and to what
extent.]
Good: Tricia was 135.6 lbs. when started with her zumba/aerobics exercises. After three months of attending the sessions
three times a week, her weight was down to 122.8 lbs. How many lbs. did she lose after three months? Write your final
answer in the space provided and show your computation. Write the exact weight; do not round off.

2. Be specific and clear of the type of response required from the students.

Faulty: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity juice in Philippines, aside from their Singapore
market. The sales for the juice in the Singapore market were S$5million more than those of their Philippine market in 2016,
S$3million more in 2017, and S$4.5million in 2018. If the sales in Philippine market in 2018 was PHP35million, what were
the sales in Singapore market during that year? [This is a faulty question because it does not specify in what currency should
the answer be presented.]

Good: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity juice in Philippines, aside from their Singapore
market. The sales for the juice in the Singapore market were S$%million more than those of their Philippine market in 2016,
S$3million more in 2017, and S$4.5million in 2018. If the sale in Mexican market in 2018 was PHP35million, what were the
sales in U.S market during that year? Provide answer in Singapore dollars (1S$ = PHP36.50). [This is a better item because it
specifies in what currency should the answer be presented, and the exchange rate was given.]

3. Specify in the directions the bases for grading students’ answers/procedures.

Faulty: VCV Consultancy Firm was commissioned to conduct a survey on the voters’ preferences in Visayas and Mindanao
for the upcoming presidential election. In Visayas, 65% are for Liberal Party (LP) candidate, while 35% are for the Nationalist
Party (NP) candidate. In Mindanao, 70% of the voters are Nationalists, while 30% are LP supporters. A survey was
conducted among 200 voters for each region. What is the probability that the survey will show a greater percentage of
Liberal Party supporters in Mindanao than in the Visiyas region? [This question is underlined because it does not specify the
basis for grading the answer.]

Good: VCV Consultancy Firm was commissioned to conduct a survey on the voters’ preferences in Visayas and Mindanao
for the upcoming presidential election. In Visayas, 65% are for Liberal Party (LP) candidate, while 35% are for the Nationalist
Party (NP) candidate. In Mindanao, 70% of the voters are Nationalists, while 30% are LP supporters. A survey was
conducted among 200 voters for each region.
What is the probability that the survey will show a greater percentage of Liberal Party supports in Mindanao than in the
Visayas region? Please show your solutions to support your answer. Your answer will be graded as follows:
0 point = for wrong answer and wrong solution
1 point = for correct answer only (i.e., without or wrong solution)
3 points = for correct answer with partial solutions
5 points = for correct answer with complete solutions
P41.

Week 5

Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.

 DEVELOP

Let us review what you have learned about constructing traditional tests.
To check whether you have learned the important information about constructing the traditional types of tests, please complete
the following graphical representation.

Traditional Test Types

What are the types? When to use? Why choose it? How to construct?
P42.

Quiz #5
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
b.) Answer the following items.

1. What are these statements that learners are expected to do or demonstrate as a result of engaging in the learning process?
A. Desired learning outcomes C. Learning intents
B. Learning goals D. Learning objectives
2. Which of the following is NOT a factor to consider when choosing a particular test format?
A. Desired learning outcomes of the lesson
B. Grade level of students
C. Learning activities
D. Level of thinking to be assessed
3. Mr. Tobias is planning to use a traditional/conventional type of classroom assessment for his trigonometry quarterly quiz.
Which of the following test formats he will likely NOT use?
A. Fill-in-the-blank test C. Multiple-choice
B. Matching type D. Oral presentation
4. What is the type of test in which the learners are asked to formulate their own answers?
A. Alternative response test C. Multiple-choice type
B. Constructive-response type D. Selected-response type
5. What is the type of true or false test item in which the statement is presented with a key word or brief phrase that is
underlined, and the student has to supply the correct word or phrase?
A. A-B variation C. T-F substitution variation
B. T-F correction question D. Yes-No variation
6. What is the type of test items in which learners are required to answer a question by filling in a blank with the correct word
or phrase?
A. Essay test
B. Fill-in-the-blank or completion test item
C. Modified true or false test
D. Short answer test
7. What is the most appropriate test format to use if teachers want to measure learners’ higher-order thinking skills,
particularly their abilities to reason, analyze, synthesize, and evaluate?
A. Essay C. Multiple-choice
B. Matching type D. True or False
8. What is the first step when planning to construct a final exam in Algebra?
A. Come up with a table if specification
B. Decide on the length of the test
C. Define the desired learning outcomes
D. Select the type of test to construct
9. What is the type of learning outcome that Ms. Araneta is assessing if she wants to construct a multiple-choice test for her
Philippine History class?
A. Knowledge C. Problem solving skills
B. Performance D. Product
10. In constructing a fill-in-the-blanks or completion test, what guidelines should be followed?
Answer:______________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
P43.

Lesson 6: Establishing Test Validity and Reliability

In order to establish the validity and reliability of an assessment tool, you need to know the different ways of establishing
test validity and reliability. You are expected to read this before you can analyze your items.

In this lesson, you are expected to:


 Use procedures and statistical analysis to establish test validity and reliability;
 Decide whether a test is valid or reliable; and
 Decide which test items are easy and difficult.

What is test reliability?


Reliability is the consistency of the responses to measure under three conditions: (1) when retested on the same person; (2)
when retested on the same measure; and (3) similarity of responses across items that measure the same characteristic. In the first
condition, consistent response is expected when the test is given to the same participants. In the second condition, reliability is
attained if the responses to the same tests is consistent with the same test or its equivalent or another test that measures but
measures the same characteristic when administered at a different time. In the third condition, there is reliability when the person
responded in the same way or consistently across items that measure the same characteristic.
P44.

There are different factors that affect the reliability of a measure. The reliability of a measure can be high or low, depending on the
following factors:
1. The number of items in a test – The more items a test has, the likelihood of reliability is high. The probability of obtaining
consistent scores is high because of the large pool of items.

2. Individual differences of participants – Every participant possesses characteristics that affect their performance in a test,
such as fatigue, concentration, innate ability, perseverance, and motivation. These individual factors change over time and
affect the consistency of the answers in a test.

3. External environment – The external environment may include room temperature, noise level, depth of instruction,
exposure to materials, and quality of instruction, which could affect changes in the responses of examinees in a test.

What are the different ways to established test reliability?

There are different ways in determining the reliability of a test. The specific kind of reliability will depend of the (1) variable
you are measuring, (2) type of test, and (3) number of versions of the test.

The different types of reliability are indicated and how they are done.

Notice in the third column that statistical analysis is needed to determine the test reliability.
Method in
Testing How is this reliability done? What statistics is used?
Reliability

1. Test-retest You have a test, and you need to administer it at one time Correlate the test scores from the first and the next
to a group of examinees. Administer it again at another administration.
time to the “same group” of examinees. Significant and positive correlation indicates that the
There is a time interval of not more than 6 months test has temporal stability overtime.
between the first and second administration of tests that
measure stable characteristics, such as standardized Correlation refers to a statistical procedure where
aptitude tests. The post-test can be given with a linear relationship is expected for two variables. You
minimum time interval of 30 minutes. may use Pearson Product Moment Correlation or
The responses in the test should more or less be the same Pearson r because test data are usually in an interval
across the two points in time. scale (refer to a statistics book for Pearson r).

Test-retest is applicable for tests that measure stable


variables, such as aptitude and psychomotor measures
(e.g., typing test, tasks in physical education).
P45.
2. Parallel There are two versions of a test. Correlate the test results for the first form and the
Forms The items need to exactly measure the same skill. Each second form. Significant and positive correlation
test version is called a “form.” Administer one form at coefficient are expected. The significant and positive
one time and the other form to another time to the correlation indicates that the responses in the two
“same” group of participants. The responses on the two forms are the same or consistent. Pearson r is usually
forms should be more or less the same. used for this analysis.

Parallel forms are applicable if there are two versions of


the test. This is usually done when the test is repeatedly
used for different groups, such as entrance examinations
and licensure examinations. Different versions of the test
are given to a different group of examinees.
3. Split-Half Administer a test to a group of examinees. The items Correlate the two sets of scores using Pearson r.
need to be split into halves, usually using odd-even After the correlation, use another formula called
technique. In this technique, get the sum of the points Spearman-Brown Coefficient. The correlation
in the odd-numbered items and correlate it with the coefficient obtained using Pearson r and Spearman
sum of points of the even-numbered items. Each Brown should be significant and positive to mean
examinee will have two scores on each set should be that the test has internal consistency reliability.
close or consistent.
Split-half is applicable when the test has a large
number of items.

4. Test of This procedure involves determining if the scores for A statistical analysis called Cronbach’s alpha or the
Internal each item are consistently answered by the examinees. Kuder Richardson is used to determine the internal
Consistency After administering the test to a group of examinees, it consistency of the items. A Cronbach’s alpha value
Using is necessary to determine and record the scores for of 0.60 and above indicates that the test items have
Kuder- each item. The idea here is to see if the responses per internal consistency.
Richardson item are consistent with each other.
And
Cronbach’s
Alpha
Method
5. Inter-rater This procedure is used to determine the consistency of A statistical analysis called Kendall’s tau coefficient
Reliability multiple raters when using rating scales and rubrics to of concordance is used to determine if the ratings
judge performance. The reliability here refers to the provided by multiple raters agree with each other.
similar or consistent ratings provided by more than Significant Kendall’s tau value indicates that the
one rater or judge when they use an assessment tool. raters concur or agree with each other in their
ratings.
Inter-rater is applicable when the assessment requires
the use of multiple raters.

You will notice in the table that statistical analysis is required to determine the reliability of a measure. The very basis
of statistical analysis to determine reliability is the use of linear regressions.

1. Linear regression
Linear regression is demonstrated when you have two variables that are measured, such as two set of scores in a test
taken at two different times by the same participants. When the two scores are plotted in a graph (with X- and Y-axis),
they tend to form a straight line. The straight line formed for the two sets of scores can produce a linear regression. When
a straight line is formed, we can say that there is a correlation between the two sets of scores. This can be seen in the
graph shown. This correlation is shown in the graph given. The graph is called a scatterplot. Each point in the scatterplot is
a respondent with two scores (one for each test).
P46.

Scatterplot (Spreadsheet 1 10v*10c)


Score 2 =4.8493+1.0403*x
22

20

18

16

14

12

10

6
2 4 6 8 10 12 14 16 18

Score 1

2. Computation of Pearson r correlation


The index of the linear regression is called a correlation coefficient. When the points in a scatterplot tend to fall within
the linear line, the correlation is said to be strong. When the direction of the scatterplot is directly proportional, the
correlation coefficient will have a positive value. If the line is inverse, the correlation coefficient will a have a negative
value. The statistical analysis used to determine the correlation coefficient is called Pearson r. How the Pearson r is
obtained is illustrated below.
Suppose that a teacher gave the spelling of two-syllable words with 20 items for Monday and Tuesday. The teacher
wanted to determine the reliability of two sets of scores by computing for the Pearson r.
Formula:

N ( ƩXY ) – (ƩX)(ƩY )
r=
√ [N ( Ʃ X ) – (ƩX )² ][ N ( Ʃ Y
2 2
) – (ƩY )²]

Monday Test Tuesday Test


X Y X² Y² XY
10 20 100 400 200
9 15 81 225 135
6 12 36 144 72
10 18 100 324 180
12 19 144 361 228
4 8 16 64 32
5 7 25 49 35
7 10 49 100 70
16 17 256 289 272
8 13 64 169 104
ƩX = 87 ƩY = 139 ƩX² = 871 ƩY² = 2125 ƩXY = 1328
P47.

ƩX – Add all the X scores (Monday scores) XY – Multiply the X and Y scores
ƩX – Add all the Y scores (Tuesday scores) ƩX² – Add the squared values of X
X² – Square the value of the X scores ƩY² – Add the squared values of Y
(Monday scores) ƩXY – Add all the product of X and Y
Y² – Square the value of the Y scores
(Tuesday scores)
Substitute the values in the formula:

10 ( 1328 ) – (87)(139)
r=
√ [10 ( 871 ) – (87) ²][10 ( 2125 ) – ( 139)²]

r =0.80

The value of correlation coefficient does not exceed 1.00 or -1.00. A value of 1.00 and -1.00 indicates perfect
correlation. In test of reliability though, we aim for high positive correlation to mean that there is consistency in the way
the student answered the tests taken.

3. Difference between a positive and a negative correlation


When the value of the correlation coefficient is positive, it means that the higher the scores in X, the higher the
scores in Y. This is called a positive correlation. In the case of two spelling scores, a positive correlation is obtained.
When the value of the correlation coefficient is negative, it means that higher the scores in X, the lower the scores in Y,
and vice versa. This is called a negative correlation. When the same test is administered to the same group of
participants, usually a positive correlation indicates reliability or consistency of the scores.

4. Determining the strength of a correlation


The strength of the correlation also indicates the strength of the reliability of the test. This is indicated by the
value of the correlation coefficient. The closer the value to 1.00 or -1.00, the stronger is the correlation. Below is the
guide:

0.80—1.00 Very strong relationship


0.6—0.79 Strong relationship
0.40—0.59 Substantial/marked relationship
0.2—0.39 Weak relationship
0.00—0.19 Negligible relationship

5. Determining the significance of the correlation


The correlation obtained between two variables could be due to chance. In order to determine if the correlation
is free of certain errors, it is tested for significance. When a correlation is significant, it means that the probability of the
two variables being related is free of certain errors.
In order to determine if a correlation coefficient value is significant, it is compared with an expected probability
of correlation coefficient values called a critical value. When the value computed is greater than the critical value, it
means that the information obtained has more than 95% chance of being correlated and is significant.
Another statistical analysis mentioned to determine the internal consistency of test is the Cronbach’s alpha.
Follow the procedure to determine the internal consistency.
Suppose that five students answered a checklist about their hygiene with a scale of 1 to 5, where in the
following are the corresponding scores:

5 - always, 4 – often, 3 – sometimes, 2 – rarely, 1 – never


P48.

The checklist has five items. The teacher wanted to determine if the items have internal consistency.

Student Item Item Item Item Item Total for Score- (Score-Mean)²
1 2 3 4 5 each case(X) Mean
A 5 4 4 4 1 19 2.8 7.84
B 3 3 3 3 2 15 -1.2 1.44
C 2 3 3 3 3 16 -0.2 0.04
D 1 2 3 3 3 13 -3.2 10.24
E 3 3 4 4 4 18 1.8 3.24
X̅case=16.2 Ʃ(Score – Mean)² = 22.8

Total for 14 21 16 17 13 X̅item= 16.2


each Ʃ(Score−Mean)²
item σt²
(ƩX) n−1
ƩX² 48 91 54 59 39 22.8
σ t ²=
5−1
SDt² 2.2 0.7 0.7 0.3 1.3 ƩSDt²= 5.2 σ t ²=5.7

Cronbach’s ɑ = ( n−1
n
)( σ t ²−Ʃ(σ
σt²
t ²)
)
Cronbach’s ɑ = (
5−1 )( 5.7 )
5 5.7−5.2

Cronbach’s ɑ = 0.10

The internal consistency of the responses in the attitude toward teaching is 0.10, indicating low internal
consistency.
The consistency of ratings can also be obtained using a coefficient of concordance. The Kendall’s ω coefficient of
concordance is used to test the agreement among raters.
Below is a performance task demonstrated by five students rated by three raters. The rubric used a scale of 1 to 4,
where in 4 is the highest and 1 is the lowest.
Five Rater Rater Rater Sum of D D²
demonstrations 1 2 3 Ratings
A 4 4 3 11 2.6 6.76
B 3 2 3 8 -0.4 0.16
C 3 4 4 11 2.6 6.76
D 3 3 2 8 -0.4 0.16
E 1 1 2 4 -4.4 19.36
X̅ Ratings= 8.4 ƩD²= 33.2

The scores given by the three raters are first computed by summing up the total ratings for each demonstration. The
mean is obtained for the sum of ratings (X̅ Ratings = 8.4). The mean is subtracted from each of the Sum of Ratings (D). Each mean
and summation of squared difference is substituted in the Kendall’s ω formula. In the formula, m is the numbers of raters.
12 Ʃ ²
W= 2
m² (N )(N −1)
12(33.2)
W=
3 ²(5)(52−1)
W=0.37
P49.

Kendall,s ω coefficient value of 0.38 indicates the agreement of the three raters in the five demonstration.
There is moderate concordance among the three raters because the value is far from 1.00.

What is test validity?


A measure is valid when it measures what it is supposed to measure. If a quarterly exam is valid, then the
contents should directly measure the objectives of the curriculum. If a scale that measures personality is composed of
five factors, then the scores on the five factors should have items that are highly correlated. If an entrance exam is valid,
it should predict students’ grades after the first semester.

What are the different ways to establish test validity?

There are different ways to establish test validity.

Type of validity Definition Procedure


Content Validity When the items represent the domain The items are compared with the objectives of the
being measured program. The items need to measure directly the
objectives (for achievement) or definition (for scales).
A reviewer conducts the checking.
Face Validity When the test is presented well, free of The test items and layout are reviewed and tried out
errors, and administered well on a small group of respondents. A manual for
administration can be made as a guide for the test
administer.
Predictive A measure should predict a future A correlation coefficient is obtained where X-variable
Validity criterion. Example is an entrance exam is used as the predictor and the Y-variable as the
predicting the grades of the students after criterion.
the first semester.
Construct Validity The components or factors of the test The Pearson r can be used to correlate the items for
should contain items that are strongly each factor. However, there is a technique called
correlated. factor analysis to determine which items are highly
correlated to form a factor.
Concurrent When two or more measures are present The scores on the measures should be correlated.
Validity for each examinee that measure the same
characteristic.
Convergent When the components or factors of a test Correlation is done for the factors of the test.
Validity are hypothesized to have a positive
correlation
Divergent Validity When the components or factors of a test Correlation is done for the factors of the test.
are hypothesized to have a negative
correlation. An example to correlate are
the scores in a test on intrinsic and
extrinsic motivation.

There are cases for each type of validity provided that illustrates how it is conducted. After reading the cases and
references about the different kinds of validity, partner with a seatmate and answer the following questions. Discuss your
answers. You may use other references and browse the internet.
P50.

1. Content Validity
A coordinator in science is checking the science test paper for grade 4. She asked the grade 4 science teacher to
submit the table of specifications containing the objectives of the lesson and the corresponding items. The coordinator
checked whether each item is aligned with the objectives.
 How are the objectives used when creating test items?
 How is content validity determined when given the objectives and the items in a test?
 What should be present in a test table of specifications when determining content validity?
 Who checks the content validity of items?
2. Face validity
The assistant principal browsed the test paper made by the math teacher. She checked if the contents of the
items are about mathematics. She examined if instructions are clear. She browsed through the items if the grammar is
correct and if the vocabulary is within the students’ level of understanding.
 What can be done in order to ensure that the assessment appears to be effective?
 What practices are done in conducting face validity?
 Why is face validity the weakest form of validity?
3. Predictive validity
School admission’s office developed an entrance examination. The officials wanted to determine if the results of
the entrance examination are accurate in identifying good students. They took the grades of the students accepted for
the first quarter. They correlated the entrance exam results and the first quarter grades. They found significant and
positive correlations between the entrance examination scores and grades. The entrance examination results predicted
the grades of students after the first quarter. Thus, there was predictive-prediction validity.
 Why are two measures needed in predictive validity?
 What is the assumed connection between these two measures?
 How can we determine if a measure has predictive validity?
 How are the test results of predictive validity interpreted?
4. Concurrent Validity
A school guidance counselor administered a math achievement test to grade 6 students. She also has a copy of
the student’s grade in math. She wanted to verify if the math grades of the students are measuring the same
competencies as the math achievement test. The school counselor correlated the math achievement scores and math
grades to determine if they are measuring the same competencies.
 What needs to be available when conducting concurrent validity?
 At least how many tests are needed for conducting concurrent validity?
 What statistical analysis can be used to establish concurrent validity?
 How are the results of a correlation coefficient interpreted for concurrent validity?
5. Construct Validity
A science test was made by a grade 10 teacher composed of four domains: matter, living things, force and
motion, and earth and space. There are 10 items under each domain. The teacher wanted to determine if the 10 items
made under each domain really belonged to that domain. The teacher consulted an expert in test measurement. They
conducted a procedure called factor analysis. Factor analysis is a statistical procedure done to determine if the items
written will load under the domain they belong.
 What type of test requires construct validity?
 What should the test have in order to verify its construct?
 What are constructs and factors in a test?
 How are these factors verified if they are appropriate for the test?
 What results come out in construct validity?
 How are the results in construct validity interpreted?
P51.

The construct validity of a measure is reported in journal articles. The following are guide questions used when
searching for the construct validity of a measure from reports:
What was the purpose of construct validity?
What type of test was used?
What are the dimensions or factors that were studied using construct validity?
What procedure was used to establish the construct validity?
What statistic was used for the construct validity?
What were the results of the tests construct validity?
6. Convergent Validity
A math teacher developed a test to be administered at the end of the school year, which measures number
sense, patterns and algebra, measurements, geometry, and statistics. It is assumed by the math teacher that students’
competencies in number sense improves their capacity to learn patterns and algebra and other concepts. After
administering the test, the scores were separated for each area, and these five domains were intercorrelated using
Pearson r. The positive correlation between number sense scores increase, the patterns and algebra scores also
increase. This shows student learning of number sense scaffold patterns and algebra competencies.
 What should a test have in order to conduct convergent validity?
 What are done with the domains in a test on convergent validity?
 What analysis is used to determine convergent validity?
 How are the results in convergent validity interpreted?
7. Divergent Validity
An English teacher taught metacognitive awareness strategy to comprehend a paragraph for grade 11 students.
She wanted to determine if the performance of her students in reading comprehension would reflect well in the reading
comprehension test. She administered the same reading comprehension test to another class which was not taught the
metacognitive awareness strategy. She compared the results using a t-test for independent samples and found that the
class that was taught metacognitive awareness strategy performed significantly better that the other group. The test has
divergent validity.
 What conditions are needed to conduct divergent validity?
 What assumption is being proved in divergent validity?
 What statistical analysis can be used to establish divergent validity?
 How are the results of divergent validity interpreted?
How to determine if an item is easy or difficult?
An item is difficult if majority of students are unable to provide the correct answer. The item is easy if majority
of the students are able to answer correctly. An item can discriminate if the examinees who score high in the test can
answer more the items correctly than examinees who got low scores.
Below is a dataset of five items on the addition and subtraction of integers. Follow the procedure to determine
the difficulty and discrimination of each item.

1. Get the total score of each student and arrange scores from the highest to lowest.

Item 1 Item 2 Item 3 Item 4 Item 5


Student 1 0 0 1 1 1
Student 2 1 1 1 0 1
Student 3 0 0 0 1 1
Student 4 0 0 0 0 1
Student 5 0 1 1 1 1
Student 6 1 0 1 1 0
Student 7 0 0 1 1 0
Student 8 0 1 1 0 0
Student 9 1 0 1 1 1
Student 10 1 0 1 1 0
P52.

2. Obtain the upper and lower 27% of the group. Multiply 0.27 by the total number of students, and you will get a
value of 2.7. The rounded whole number value is 3.0. Get the top three students and the bottom 3 students based
on their total scores. The top three students are students 2, 5, and 9. The bottom three students are students 7, 8,
and 4. The rest of the students are not included in the item analysis.

Item 1 Item 2 Item 3 Item 4 Item 5 Total score


Student 2 1 1 1 0 1 4
Student 5 0 1 1 1 1 4
Student 9 1 0 1 1 1 4
Student 1 0 0 1 1 1 3
Student 6 1 0 1 1 0 3
Student 10 1 0 1 1 0 3
Student 3 0 0 0 1 1 2
Student 7 0 0 1 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1

3. Obtain the proportion correct for each item. This is computed for the upper 27% group and the lower 27%group.
This is done by summating the correct answer per item and dividing it by the total number of students.

Item 1 Item 2 Item 3 Item 4 Item 5 Total score


Student 2 1 1 1 0 1 4
Student 5 0 1 1 1 1 4
Student 9 1 0 1 1 1 4
Total 2 2 3 2 3
Proportion of 0.67 0.67 1.00 0.67 1.00
the high group
(PH)
Student 7 0 0 1 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1
Total 0 1 2 1 1
Proportion of 0.00 0.33 0.67 0.33 0.33
the low group
(PL)

4. The item difficulty is obtained using the following formula:

pH + pL
Item difficulty =
2
The difficulty is interpreted using the table:

Difficulty Index Remark


0.76 or higher Easy Item
0.25 to 0.75 Average Item
0.24 or lower Difficult Item
P53.

Computation

Item 1 Item 2 Item 3 Item 4 Item 5


0.67+0 0.67+0.33 2.00+0.67 1.00+0.33 1.00+0.33
= 2 = 2 = 2 = 2 = 2
Index of difficulty 0.33 0.50 0.83 0.50 0.67
Item difficulty Difficult Average Easy Average Average
5. The index of discrimination is obtained using the formula:
Item discrimination = pH – pL
The value is interpreted using the table:

Index discrimination Remark


0.40 and above Very good item
0.30 – 0.39 Good item
0.20 – 0.29 reasonably Good item
0.10 – 0.19 Marginal item
Below 0.10 Poor item

= 0.67 – 0 = 0.67 – 0.33 = 2.00 – 0.67 = 1.00 – 0.33 = 1.00 – 0.33


Discrimination 0.67 0.33 0.33 0.33 0.67
Index
Discrimination Very good item Good item Good item Good item Very good item

Get the results of your previous exam in the class and conduct item analysis. Determine the difficulty and
discrimination. Tabulate the results for each item below. Indicate the index of difficulty, then write if the item is very
difficult, difficult, average, easy, and very easy. In the last column, indicate the index of discrimination and write if very
good item, good item, fair item, or poor item.

Item difficulty Item discrimination


Item 1
Item 2
Item 3
Item 4
Item 5
When developing a teacher-made test, it is good to have items that are easy, average, and difficult with positive
discrimination indices. If you area developing a standardized test, the rule is more stringent as it aims for average items
or not so easy nor difficult items and whose discrimination index is at least 0.3.
P54.

Week 6

Name: __________________________________
Subject/Time: ___________________________
Instructor’s name: ________________________
Instructions: a.) Using other bond papers/papers strictly prohibited.

 DEVELOP

A. Indicate the type of reliability applicable for each case. Write the type of reliability on the space before the number.
1. Mr. Perez conducted a survey of his students to determine their study habits. Each item is answered using
five-point scale (always, often, sometimes, rarely, never). He wanted to determine if the responses for each
item are consistent. What reliability technique is recommended?
2. A teacher administered a spelling test to her students. After a day, another spelling test was given with the
same length and stress of words. What reliability can be used for the two spelling tests?
3. A PE teacher requested two judges to rate the dance performance of her students in physical education.
What reliability can be used to determine the reliability of the judgments?
4. An English teacher administered a test to determine students’ use of verb given a subject with 20 items.
The scores were divided into items 1 to 10, and another for items 11 to 20. The teacher correlated the two
set of scores that form the same test. What reliability is done here?
5. A computer teacher gave a set of typing tests on Wednesday and gave the same set the following week.
The teacher wanted to know if the students’ typing skills are consistent. What reliability can be used?

B. Indicate the type of validity applicable for each case. Write the type of validity on the blank before the number.
1. The science coordinator developed a science test to determine who among the students will be placed in
an advanced science section. The students who scored high in the science test were selected. After two
quarters, the grades of the students in the advanced science were determined. The scores in the science
test were correlated with science grades to check if the science test was accurate in the selection of
students. What type of validity was used?
2. A test composed of listening comprehension, reading comprehension, and visual comprehension item was
administered to students. The researcher determined if the scores on each area refers to the same skill on
comprehension. The researcher hypothesized a significant and positive relationship among these factors.
What type of validity was established?
3. The guidance counselor conducted an interest inventory that measured the following factors: realistic,
investigate, artistic, scientific, enterprising, and conventional. The guidance counselor wanted to provide
evidence that the items constructed really belong to the factor proposed. After her analysis, the proposed
items had high factor loadings on the domain they belong to. What validity was conducted
4. The technology and livelihood education teacher developed a performance task to determine student
competency in preparing a dessert. The students were tasked with selecting a dessert, preparing the
ingredients, and making the dessert in the kitchen. The teacher developed a set of criteria to assess the
dessert. What type of validity is shown here?
5. The teacher in a robotics class taught students how to create a program to make the arms of a robot move.
The assessment was a performances task of making a program to make three kinds of robot arm
movements. The same assessment task was given to students with no robotics class. The programming
performance of the two classes was compared. What validity was established?
P55.
Quiz #6
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
b.) Choose the letter of the correct and best answer in every item.

1. Which is a way in establishing test reliability?


a. The test is examined if free from errors and properly administered.
b. Scores in a test with different versions are correlated to test if they are parallel.
c. The components or factors of the test contain items that are strongly uncorrelated.
d. Two or more measures are correlated to show the same characteristics of the examinee.
2. What is being established if items in the test are consistently answered by the students?
a. Internal consistency
b. Inter-rater reliability
c. Test-retest
d. Split-half
3. Which type of validity was established if the components or factors of a test are hypothesized to have a negative
correlation?
a. Construct validity
b. Predictive validity
c. Content validity
d. Divergent validity
4. How do we determine if an item is easy or difficult?
a. An item is easy if majority of the students are not able to provide the correct answer. The item is difficult if
majority of the students are able to answer correctly.
b. An item is difficult if majority of the students are not able to provide the correct answer. The item is easy if
majority of the students are able to answer correctly.
c. An item can be determined difficult if the examinees who are high in the test can answer more the items
correctly than the examinees who got low scores. If not, the item is easy.
d. An item can be determined easy if the examinees who are high in the test can answer more the items
correctly than the examinees who got low scores. If not, the item is difficult.
5. Which is used when the scores of the two variables measured by a test taken at two different times by the same
participants are correlated?
a. Pearson r correlation
b. Linear regression
c. Significance of the correlation
d. Positive and negative correlation
P56.

Lesson 7: Organization of Test Data using Tables and Graphs

Test data are better appreciated and communicated if they are arranged, organized, and presented in a clear
and concise manner. Good presentation requires designing a table that can be read easily and quickly. Tables and
graphs are common tools that help readers better understand the test results that are conveyed to concerned
groups like teachers, students, parents, administrators, or researchers, which are used as basis in developing
programs to improve learning of students.

In this lesson, you are expected to:


 Organize test data using tables and graphs and
 Interpret frequency distribution of test data.

To begin the discussion in this lesson, consider a group of raw scores recorded from a summative test administered to
100 college students at a teacher education university.

Table 7.1. Scores of 100 College Students in a Final Examination

53 30 21 42 33 41 42 45 32 58
36 51 42 49 64 46 57 35 45 51
57 38 49 54 61 36 53 48 52 49
41 58 42 43 49 51 42 50 62 60
33 43 78 52 58 45 53 40 60 33
75 66 78 52 58 45 53 40 60 33
46 45 79 34 46 43 47 37 33 64
37 36 36 46 41 43 42 47 56 62
50 53 49 39 52 52 50 37 53 40
34 43 43 57 48 43 42 42 65 35
P57.
How do we organize and present ungrouped data through tables?
As you can see in Table 7.1, the test scores presented as a simple list of raw scores. Raw scores are easy to get
because these are scores that are obtained from administering a test, a questionnaire, or any inventory rating scale to
measure knowledge, skills, or other attributes of interest. But as presented in the above table, how do these numbers
appeal to you? Most likely, they do not look interesting nor meaningful.

Look at following table.


Table 7.2. Frequency Distribution of Test Scores

Test Scores (X) Frequency (f) Percent Cumulative Percent


21.00 1 1.0 1.0
30.00 1 1.0 2.0
32.00 1 1.0 3.0
33.00 6 6.0 9.0
34.00 1 1.0 10.0

35.00 3 3.0 13.0


36.00 4 4.0 17.0
37.00 4 4.0 21.0
38.00 1 1.0 22.0
39.00 1 1.0 23.0
40.00 2 2.0 25.0
41.00 3 3.0 28.0
42.00 9 9.0 37.0
43.00 7 7.0 44.0
45.00 4 4.0 48.0
46.00 4 4.0 52.0
47.00 2 2.0 54.0
48.00 2 2.0 56.0
49.00 6 6.0 62.0
50.00 4 4.0 66.0
51.00 3 3.0 69.0
52.00 4 4.0 73.0
53.00 5 5.0 78.0
54.00 1 1.0 79.0
56.00 1 1.0 80.0
57.00 4 4.0 84.0
58.00 3 3.0 87.0
60.00 2 2.0 89.0
61.00 1 1.0 90.0
62.00 3 3.0 93.0
64.00 2 2.0 95.0
65.00 1 1.0 96.0
66.00 1 1.0 97.0
75.00 1 1.0 98.0
78.00 1 1.0 99.0
79.00 1 1.0 100.0
Total 100 100.0
P58.
Table 7.2 is a simple frequency distribution that shows an ordered arrangement of scores, which is better than
the random arrangement of raw scores in Table 7.1. The listing of scores can be in descending or ascending order. You
create this table by simply tallying the scores. There is no grouping of scores but a recording of the frequency in a single
test score. With this table, you know in a second what is the highest and lowest score, and the corresponding counts for
each score. The frequency and percent columns give more specific information on the number and percentage of
students who got a specific test score, which Table 7.1 does not provide. Moreover, you can roughly estimate the
concentration of scores with this frequency table. The cumulative percentage in the last column calculates the
percentage of the cumulative frequency below a certain score in the dataset. For example, in the 6th row, the test score
of 35 has a corresponding cumulative percentage of 13. This means that 13 percent of the class obtained a score below
35. Conversely, you can also say that 87 percent of the scores are above 35.
While Table 7.2 appears more informative than Table 7.1, you may still find the presented data lengthy. Can you
imagine how your table will look like when the test has more items, and examinees or when scores are more spread
out?
Look at the next table:

Table 7.3. Frequency Distribution of Grouped Test Scores

Class Interval Midpoint (X1) f Cumulative Cumulative


Frequency (cf) Percentage
75-80 77 3 100 100
70-74 72 0 97 97
65-69 67 2 97 97
60-64 62 8 95 95
55-59 57 8 87 87
50-54 52 17 79 79
45-49 47 18 62 62
40-44 42 21 44 44
35-39 37 13 23 23
30-34 32 9 10 10
25-29 27 0 1 1
20-24 22 1 1 1
Total (N) 100

Apparently, the data presented in Tables 7.1 and 7.2 have been condensed as a result of grouping of scores.
Table 7.3 illustrates a grouped frequency distribution of test scores. The wide range of scores listed in Table 7.2 has
been reduced to 12 class intervals with an interval size of 5. Let us consider again cumulative percentage in the 5th row
of the class interval of 55–59, which is 87. We say that, 87 percent of the students got a score below 60.
The second column enters the midpoints of the test scores in each class interval. By the term itself, midpoint
connotes the middle score, which is halfway between the exact limits of the interval. In Table 7.3, the midpoint of the
class interval 60-64 is 62. The exact limits of this interval are 59.5 (60 – 0.5) have been condensed to appear simpler,
there is a tradeoff. Looking at Table 7.3, how many students scored 48 in the test? How many got a score of 37? As such,
while grouped frequency distribution condensed the data, it results in a loss of information on the individual scores
themselves.

Following are some conventions in presenting test data grouped in frequency distribution:
1. As much as possible, the size of the class intervals should be equal. Class intervals that are multiples of 5, 10, 100,
etc. are often desirable. At times, when large gaps exist in the data and unequal class intervals are used, such
intervals may cause inconvenience in the preparation of graphs and computation of certain descriptive statistical
measures.
P59.
The following formula can be useful in estimating the necessary class interval:
i=H–L
c
where i = size of the class intervals
H = highest test score
L = lowest test score
C = number of classes
The conventional number of classes to group the data generally varies from 7 to 20. As seen in Table 7.3, the size of
the class interval is 5, which is an odd number. If you look at the midpoints, these are whole numbers. If class size is
an even number, then the midpoints will contain decimal numbers which may add some difficulties in conventional
computations for some important statistical measures.
2. Start the class interval at a value which is a multiple of the class width. In Table 7.3, we used the class interval of 5
such that we start with the class value of 20, which is a multiple of 5 and where 20-24 includes the lowest test score
of 21, as seen in Table 7.1.
3. As much as possible, open-ended class intervals should be avoided, e.g., 100 and below or 150 and above. These will
cause some problems in graphing and computation of descriptive statistical measures.

How do we present test data graphically?


You must be familiar with the saying, “A picture is worth a thousand words.” In a similar vein, “a graph can be
worth a hundred or a thousand numbers.” The use of tables may not be enough to give a clear picture of the properties
of a group of test scores. If numbers presented in tables are transformed into visual models, then the reader becomes
more interested in reading the material. Consequently, understanding of the information and problems for discussion is
facilitated. Graphs are very useful for the comparison of test results of different groups of examinees.
There are many types of graphs, but the more common methods of graphing a frequency distribution are the
following:
1. Histogram. A histogram is a type of graph appropriate for quantitative data such as test scores. This graph consists
of columns–each has a base that represents one class interval, and its height represents the number of observations
or simply the frequency in that class interval. There are statistical software that are available to help construct
histograms and other forms of graphs. More important than long-hand procedures in constructing histograms is
how to cull information from the graph. Look at the graph in Figure 7.1.

Test Scores
Figure 7.1. Histogram of Test Scores of College Students

The graph was automatically generated with the use of statistical software. In this case, Statistical Package for
Social Sciences (SPSS) was used. Basic steps in SPSS application include the following:
Step 1. Open the Data Editor window. It is understood that the data has already been entered into the Data editor,
following the data entry process. The assumption here is that you already know the basics of entering data into
a statistical program.
P60.
Step 2. On the menu bar, click Analyze, then go to Descriptive Statistics, then to Frequencies. This brings up the
Frequencies dialog box as seen below.

The above image shows the Data Editor. In the next line of the image, you see there is row on File – Edit – Data –
Transform – Analyze – Graphs.

Step 3. To make a histogram, do the following steps:


 Open the Data Editor
 On the menu bar, click on Graphs – Legacy Dialogs – Histogram
 Click OK
You will get this image below.

After you have clicked OK, the desired histogram will automatically be shown. The same process will be followed
in generating other types of graphs.
You may also try to organize the data using your knowledge in Excel. The following web preferences can be useful for
those who have never used SPSS: https://www.spss-tutorial.com/spss-data-analysis,www.statisticssolutions.com/spss-
statistics-help. There are more you can access. There are more tutorials available online.
2. Frequency Polygon. This is also used for quantitative data, and it is one of the most commonly used methods in
presenting test scores. It is the line graph of a frequency polygon. It is very similar to a histogram, but instead of
bars, it uses lines to compare sets of test data in the same axes. Figure 7.2 illustrates a frequency polygon.
P61.

Figure 7.2. Frequency Polygon in Reading Comprehension Test

In a frequency polygon, you have lines across the scores in the horizontal axis. Each point in the frequency
polygon represents two numbers, which are the score in the horizontal axis and the frequency of that class interval in
the vertical axis. Frequency polygons can also be superimposed to compare several frequency distributions, which
cannot be done with histograms.

You can construct a frequency polygon manually using the histogram in Figure 7.1 by following these simple steps:

1. Locate the midpoint on the top of each bar. Bear in mind that the height of each bar represents the frequency in
each class interval, and the width of the bar is the class interval. As such, that point in the middle of each bar is
actually the midpoint of that class interval. In the histogram on Figure 7.1, there are two spaces without bars. In
such a case, the midpoint falls on the line.
2. Draw a line to connect all the midpoints in consecutive order.
3. The line graph is an estimate of the frequency polygon of the test scores.

Following the above steps, we can draw a frequency polygon using the histogram presented earlier in Figure 7.1

Figure 7.3. Frequency Polygon of Test Scores of College Students

Frequency polygons can also be drawn independently without drawing histograms. In your algebra, you need
and ordered pair (x, y) to graph a point in the coordinate system. For this, the midpoints of the class intervals are used to
plot the points. The midpoints will be the x values and y will be the respective frequencies in each class interval. For data
in Table 7.3, the (x, y) values will be the (x1, f).
3. Cumulative Frequency Polygon. This graph is quite different from a frequency polygon because the cumulative
frequencies are plotted. In addition, you plot the point above the exact limits of the interval. As such, a cumulative
polygon gives a picture of the number of observations that fall below a certain score instead of the frequency within
a class interval.
P62.

In Table 7.3, the cumulative frequency is one the 4th column; in the 5th column is the conversion to cumulative
percentage. A cumulative percentage polygon is more useful when there is more than one frequency distribution with
unequal number of observations.
The following figures show the cumulative frequency polygon and cumulative percentage polygon, respectively, of the
data in Table 7.3 These cumulative frequency polygons are useful to obtain a number of summary measures. The graphs
display ogive (pronounced as “oh jive”) curves. Again, the images are computer-generated output using a statistics
software.

Cumulative Frequency Polygon of Test Scores of College Students

Scores

Figure 7.3.1 Cumulative Frequency Polygon of Test Scores of College Students

Scores

Figure 7.3.2 Cumulative Percentage Polygon of Test Scores of College Students

4. Bar Graph. This graph is often used to present frequencies in categories of a qualitative variable. It looks very similar
to a histogram, constructed in the same manner, but spaces are placed in between the consecutive bars. The
columns represent the categories and the height of each bar as in a histogram represents the frequency. If
experimental data are graphed, the independent variable in categories is usually plotted on the x-axis, while the
dependent variable is the test score on the y-axis. However, since the horizontally. Bar graphs are very useful in
comparison of test performance of groups categorized in two or more variables. Following are some examples of bar
graphs.
P63.

Simple Bar Graph

Countries

Figure 7.4. Mean Scores on Mathematics Test of Pre-Service Teachers in Different Countries

As you see in the graph above, actual numbers appear at the top portion of each graph. This is done for the
reader to see actual numbers, especially in cases when values are too close to each other and with many categories on
the baseline axis.

Double Bar Graph

Figure 7.5. Summative Test Scores of Grade 5 Pupils


Horizontal Bar Graph

Figure 7.6. Students’ Competency Level in Geometry Level in Geometry Test by Major ship
P64.

5. Box-and-Whisker Plots. This is a very useful graph depicting the distribution of test scores through their quartiles.
The first quartile, Q1, is the point in the test scale below, which 25% of the scores lie. The second quartile is the
median, which defines the upper 50% and lower 50% of the scores. The third quartile is the point above which 25%
of the scores lie. The data on the test scores of 100 college students produced this image using the box-plot
approach.

Figure 7.7. Box-and-Whisker Plots on Test Scores of College Students

Looking at the box-plot graph, the shaded rectangle represents the middle 50% of the test data. The line that
divides the rectangle is actually the median. The rectangle is referred to as the interquartile range box. The top side of
the rectangle is the 3rd quartile (Q3), and the bottom side is the 1st quartile (Q1). Looking at the scale on the left, more
or less, you can approximate the Q1 and Q3. As such, this type of graph will help readers easily see where the scores are
concentrated, how these scores are distributed and divided into quartiles, what score separates each quartile, as well as
the minimum and maximum values.
The whiskers are the lines that extend from either the top side or lower side of the box. These whiskers
represent the range for the bottom 25% and the top 25% of the data values, excluding outliers (i.e., the numbers you
see outside the whiskers). Outliers, which may be interpreted as “outcast” data, are the extreme scores. Note that
outliers are not necessarily “bad.” They can send an important message about certain phenomenon. For example, if you
want to exempt students from a final examination, the outliers at the top will indicate who can be exempted. At the
same time, those on the other extreme might need more attention and assistance to perform in your class.
6. Pie Graph. One commonly used method to represent categorical data is the use of circle graph. You have learned in
basic mathematics that there are 360° in a full circle. As such, the categories can be represented by the slices of the
circle that appear like a pie; thus, the name pie graph. The size of the pie is determined by the percentage of
students who belong in each category. For example, in a class of 100 students, results were categorized according to
different levels which showed that 10 students (10%) scored above average, 40 students (40%) average; 30 (30%)
below average; or and 2-0 (20%) poor. These percentages will be indicative of the size of the slices in the full circle. A
simple calculation will show that 10% of 360° is 36° , 40% of 360° is 144° , 30% of 360° is 108° , and 20% of 360° is 72
° . You will note that the sum of the percentages is equal to 360 ° , that is, the measure of the whole circle. Making a
pie chart is very easy. You may use an ordinary protractor or compass. Also, with the use of statistical software, you
can produce an attractive chart. You need to label each portion of the pie with different shades to indicate the
categories and label the whole chart as shown below.
P65.

Percentage Distribution of Test scores by Performance Level

Which graph is best?


No one can give a definite answer to this question. We cannot say what is the best. The histogram is the easiest
in many cases of quantitative data, but it may not be appealing if you want to make a comparison of the performance of
two or more groups. Bar graphs work well with qualitative data and if you want to compare the performance of
subgroups of examinees. Frequency and percentage polygons are useful for treating quantitative data. The cumulative
frequency and percentage polygons are valuable for determining the percentage of a distribution that falls below or
above a given point. Cumulative percentage polygon becomes more useful for the comparison of groups with unequal
number because frequencies have been converted into percentages. The box-and-whisker plots are not very popular
because they can be difficult to construct without statistical software. However, these can provide interesting and
salient information to audiences about test data that other graphs cannnot provide. In sum, the choice will depend on
your purpose and what you want to convey.
What are the variations on the shapes of frequency distributions?
As discussed earlier, a frequency distribution is an arrangement of a set of observations. These observations in
the field of education or other sciences are empirical data that illustrate situations in the real world. With the world
population reaching 7.6 billion, you can imagine hundreds of possible frequency distributions representing different
groups and subgroups taken from an infinitely large population. It is reasonable to expect that there will be variations in
the shapes of frequency distributions. Researchers, scientists, and educators have found that empirical data, when
recorded, fit the following shapes of frequency distributions.

What is skewness?
Examine the graphs below.

Figure 7.8. Symmetrical Distribution of Test Scores


P66

Figure 7.9. Negatively Skewed Distribution Figure 7.10. Positively Skewed Distribution

Figure 7.11. Rectangular Distribution

Figure 7.8 is labeled as normal distribution. Note that half the area of the curve is a mirror reflection of the
other half. In other words, it is a symmetrical distribution, which is also referred to as bell-shaped distribution. The
higher frequencies are concentrated in the middle of the distribution. A number of experiments have shown that IQ
scores, height, and weight of human beings follow a normal distribution.
The graphs in Figure 7.9 and 7.10 are asymmetrical in shape. The degree of asymmetry of a graph is its
skewness. Basic principle of a coordinate system tells you that, as you move toward the right of the x-axis, the numerical
value increases. Likewise, as you move up the y-axis, the scale value becomes higher. Thus, in a negatively-skewed
distribution, there are more who get higher scores and the tail, indicating lower frequences of distribution points to the
left or to scores and the tail indicates the lower frequencies are on the right or to the higher scores.
The graph in Figure 7.11 is a rectangular distribution. It occurs when the frequency of each score or class interval
of scores are the same or almost comparable such that it is also called a uniform distribution.
We have differentiated the four graphs in terms of skewness, which refers to their symmetry or asymmetry
(non-assymetry). Another way of characterizing frequency distribution is with respect to the numbers of “peaks” seen
on the curve.
Refer to the following grahps.
P67.

Figure 7.12. A Unimodal Freqeuncy Distribution

You see that the curve has only one peak. We refer to the shape of this distribution as unimodal. Now look at
the graph below. There are two peaks appearing at the highest frequencies.

Figure 7.13. A Bimodal Distribution

We call this bimodal distribution. For those with more than two peaks, we call these multimodal distribution. In
addition, unimodal, bimodal, or multimodal may or may not be symmetric. Look back at the negatively-skewed and
positively-skewed distributions in Figure 7.9 and 7.10. Both have one peak; hence, they are also unmoral distributions.

What is kurtosis?
Another way of differentiating frequency distributions is shown below. Consider now the graphs of three
frequency distributions in Figure 7.14.

Figure 7.14. Frequency Distributions with Different Kurtosis


P68.

What is common among the three distributions?


What difference can you observe among the three distributions of test scores?
It is the flatness of the distribution, which is also the consequence of how high or peaked the distribution is. This
property is referred to as kurtosis.
X is the flattest distribution. It has a platykurtic (platy, meaning broad or flat) distribution. Y is the normal
distribution and it is a mesokurtic (meso, meaning intermediate) distribution. Z is the steepest or slimmest, and is called
leptokurtic (lepto, meaning narrow) distribution.
What curve has more extreme scores than the normal distribution?
What curve has more scores that are far from the central value (or average) than does the normal distribution?
For the meantime, the characteristics are simply described visually. The next lesson will connect visual
characteristics to important statistical measures.
P69.

Week 7

Name: __________________________________
Subject/Time: ___________________________
Instructor’s name: ________________________

 DEVELOP

At this point, let us see how well you understood what have been presented in the preceding sections.
1. Consider the table showing the results of a reading examination of set of students.
Frequency Distribution of Scores in Mid-Term Examination in Reading
Class Interval Midpoint F Cumulative Cumulative
Frequency Percentage
140-144 142 2 ________50________ ________100________
135-139 137 7 ________48________ _________96_______
130-134 132 9 _______41_________ _________82_______
125-129 127 14 _______32_________ ________64________
120-124 122 10 _______18_________ _______36_________
115-119 117 6 _______8_________ _______16_________
110-114 112 2 2 ______4__________
a. What is being described in the table?

b. How many students are there in the class?

c. What is the class width?

d. How did we get the midpoints from the given class interval?

e. What is the lower limit of the class with the highest frequency?

f. What is the upper limit of the class with the lowest frequency?
P70.
Quiz #7
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________

 SUSTAIN

The following is a frequency distribution of examination marks:

Class Interval F
90-94 6
85-89 9
80-84 7
75-79 13
70-74 14
65-69 19
60-64 11
55-59 11
50-54 9
45-49 8
40-44 8

Answer the following question:


a. What is the size of the class interval?

b. What is the exact limit of the class interval with an observed frequency of 13? How did you determine it?

c. Without graphing, how do you see the shape of the graph? Is it symmetrical or skewed? Is it unimodal or
bimodal? Give a statement or two to support your answer.

d. Sketch the graph of the frequency distribution using the data on the table.
P71

Lesson 8: Analysis, Interpretation, and Use of Test Data

In this lesson, you are expected to:


 Analyze, interpret, and use test data applying (a) measure of central tendency, (b) measures of
variability, (c) measures of position, and (d) measures of covariability.

 Prepare
What are measures of central tendency?
The word “measures of central location or point of convergence of a set of values. Test scores have a tendency
to converge at a central value. This value is the average of the set of scores. In other words, a measure of central
tendency gives a single value that represents a given set of scores. Three commonly-used measures of central tendency
or measures of central location are the mean, the median, and the mode.
Mean. This is the most preferred measure of central tendency for use with test scores, also referred to as the
arithmetic mean. The computation is very simple. When a student has added up the examination scores he/she made in
a subject during the grading period and divided it by the number of examinations taken, then he/she has computed the
arithmetic mean.
ƩX
That is, X̅ = where X̅ = the mean, Ʃ X = the sum of all the scores, and
N
N = the number of scores in the set
Consider again the test scores of students given in Table 8.1, which is the same set of test scores used in the
previous lesson.

Table 8.1. Scores of 100 College Students in a Final Examination


53 30 21 42 33 41 42 45 32 58
36 51 42 49 64 46 57 35 45 51
57 38 49 54 61 36 53 48 52 49
41 58 42 43 49 51 42 50 62 60
33 43 37 57 35 33 50 42 62 49
75 66 78 52 58 45 53 40 60 33
46 45 79 33 46 43 47 37 33 64
37 36 36 46 41 43 42 47 56 62
50 53 49 39 52 52 50 37 53 40
34 43 43 57 43 43 42 42 65 35
The mean is the sum of all the scores from 53 down to the last score, which is 35.
That is,
ƩX
X̅ = = 53 + 36 + 57 + … + 60 + 49 + 35
N
P72

You have many ways of computing the mean. The traditional long computation techniques have outlived their relevance
due to advancement of technology and the emergence of statistical software. Using your scientific calculator, you will
see the symbols X̅, ƩX. Just follow the simple steps indicated in the guide. There are also simple steps in Excel. Different
versions of the statistical software SPSS offer the fastest way of obtaining the mean, even with hundreds of scores in a
set. There is no loss of original information because you are dealing with original individual scores. The use of statistical
software will be explained later.

While we recognize the power of technology, there is information that is unappreciated because of the short-hand
processing of data through mechanical computations. Look at the conventional way of presenting data in a frequency
distribution table as done in Lesson 7:

Table 8.2. Frequency Distribution of Grouped Test Scores


Class Interval Midpoint (X) f Xif Cumulative Cumulative
Frequency (cf) Percentage
75-80 77 3 231 100 100
70-74 72 0 0 97 97
65-69 67 2 134 97 97
60-64 62 8 496 95 95
55-59 57 8 456 87 87
50-54 52 17 884 79 79
45-49 47 18 846 62 62
40-44 42 21 882 44 44
35-39 37 13 481 23 23
30-34 32 9 288 10 10
25-29 27 0 0 1 1
20-24 22 1 22 1 1
Total (N) 100 ƩXif = 4720
In the traditional way, it cannot be argued that you can see at a glance how the scores are distributed among
the range of values in a condensed manner. You can even estimate the average of the scores by looking at the frequency
in each class interval. In the absence of statistical program, the mean can be computed with following formula:

Ʃ Xif
X̅ =
N
Where Xi = midpoint of the class interval
f = frequency of each class interval
N = total frequency
Thus, the mean of the test scores in Table 7.1 is calculated as follows:
Ʃ X i f 4720
X̅ = = = 47.2
N 100
Looking at the table, do you find the value reasonable? Why?
The easiest way is to use SPSS by simply following these steps:
1. Open the Data Editor window. It is understood you have prepared the dataset earlier.
2. On the menu bar click Analyze, then Descriptive Statistics, then Frequencies. This opens the Frequencies dialog box.
P73.

Press Continue on the Descriptive Option box, then Press OK on the left Descriptive Box, and you will finally see
the following image.
DESCRIPTIVE VARIABLES = scores
/STATISTIC = MEAN STDDEV MIN MAX.

Descriptives

Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Scores 100 21.00 79.00 47.1500 10.57954
Valid N (listwise) 100

Look again at the earlier computation of the mean for test scores presented in Table 8.2. It is 47.2

Compare it with the computed mean indicated in 5th column.

Round it off to the nearest tenth. What did you find out with the mean?

Median. median is the value that divides the ranked score into halves, or the middle value of the ranked scores.
If the number of scores is odd, then there is only one middle value that gives the median. However, if the number of
scores in the set is an even number, then there are two middle values. In this case, the median is the average of these
two middle values. But if there are more than 50 scores, arranging the scores and finding the middle value will take time.
The scientific calculator will not give you the median. Again, statistical software can do this for you with simple steps
similar to finding the mean.
P75

1. On the menu bar click on Analyze, then Descriptive Statistic, then Frequencies. This opens the Frequencies dialog
box.

2. Click on the desired variable name in the left box. In the dataset, let us consider the test scores also in Table 8.1.
Move your cursor to Statistics and the Frequency Statistics box will pop out. Click Median.
3. You will also see that you can use the same process in finding the mean. Earlier, we opted to use the Descriptives
instead of the Frequencies. The click Continue. Then press OK.

Part of the image will give these data:


Frequencies
[Dataset1] C:\Users\evangeline\Desktop\Lesson 8\SPSS Master Data Total
Scores of 11.sav.2.sav
Statistic
scores
N Valid 100
Missing 1
Median 46.0000

Again, how do you work it out the conventional way? Either, you rank the 100 scores, which takes time, or you
arrange the scores in the frequency distribution as shown here:

Table 8.2. Frequency Distribution of Grouped Test Scores

Class Interval Midpoint (X) f xif Cumulative Cumulative


Frequency (cf) Percentage
75-80 77 3 231 100 100
70-74 72 0 0 97 97
65-69 67 2 134 97 97
60-64 62 8 496 95 95
55-59 57 8 456 87 87
50 -54 52 17 884 79 79
45-49 47 18 846 62 62
40-44 42 21 882 44 44
35-39 37 13 481 23 23
30-34 32 9 288 10 10
25-29 27 0 0 1 1
20-24 22 1 22 1 1
Total (N) 100 ƩXif = 4720
P75

This formula will help you determine the median:

Mdn = lower limit of median class + size of the class interval

Applying the formula:


[ (n2)−cumulative frequency below the median class
frequency of the median class
]
1. You need a column for cumulative frequency. This is now shown on the 5th column for data in Table 8.2
n
2. Determine , which is one-half of number of scores of examinees.
2
3. Find the class interval of the 50 th score. In this case where there are 100 scores, the 50 th score is in the class
interval of 49-95. This class interval of 45-95 becomes the median class. We marked lines in the table to indicate
where the median class is located for easy reference when computing the median value.
4. Find the exact limits of the median class. In this case, class 44.5-49.5. the lower limit then is 44.5
Summing up these steps as indicated in the formula:

Median = 44.5
+5 ( 1002 −44)
18
5(6)
= 44.5 +
18
= 46.17
You will see that this value is not too far from the value of 46.00 generated in the SPSS output. When rounded
off to a whole number, they give the same value.
Mode. Mode is the easiest measure of central tendency to obtain. It is the score or value with the highest
frequency in the set of scores. If the scores are arranged in a frequency distribution, the mode is estimated as the
midpoint of the class interval which has the highest frequency. This class interval with the highest frequency is also
called the modal class. In a graphical representation of the frequency distribution, the mode is the value in the
horizontal axis at which the curve is at its highest point. If there are two highest points, then, there are two modes, as
earlier discussed in Lesson 7. When all the scores in a group have the same frequency, the group of scores has no mode.
Considering the test data in Table 8.2, it can be seen that highest frequency of 21 occurred in the class interval
45-49. The rough estimate of the mode is 42, which is the midpoint of the class interval. Using statistical software and
following the steps in finding the mean and the median, the following image will appear which gives the value of the
mode computed directly from the raw data presented in Table 8.2.

Frequencies
Statistics
Scores
Valid 100
Missing 0
Mode 42.00

You see that 42.00 is equal to the earlier estimate we obtained, that is, the midpoint of the modal class.
However, in some cases the value using the conventional method is not exactly equal to the value generated by
statistical programs.

When are mean, median, and mode appropriately used?


To appreciate comparison of the three measures of central tendency, a brief background of level of
measurement is important. We make observations and perform assessments on many variables related to learning-
vocabulary, spelling ability, self-concept, birth order, socio-economic status, etc. The level of measurement helps you
decide how to interpret the data as measures of these attributes, and this serves as a guide in determining in part the
kind of descriptive statistics to apply in analyzing the test data.

76.
Scale of Measurement. There are four levels of measurement that apply to the treatment of test data: nominal, ordinal,
interval, and ratio. In nominal measurement, the number is used for labeling or identification purposes only. An example
is the student’s identification number or section number. In data processing, instead of labeling gender as female or
male, a code “1” is used to denote Female and “2” to denote Male. While “2” is numerically greater than “1,” in this case
the difference of 1 has no meaning; it does not indicate that Male is better than Female. The purpose is simply to
differentiate or categorize the subjects by gender.
The ordinal level of measurement is used when the values can be ranked in some order of characteristics. The
numeric values used indicate the difference in traits under consideration. Academic awards are made on the basis of an
order of performance: first honor, second honor, third honor, and so on. Some assessment tools require students to
rank their interest or hobbies, or even career choices. Percentile ranks in national assessment test or entrance
examination are examples of measurement in an ordinal scale. Percentile score becomes more useful and meaningful
than simple raw scores in university entrance or division-wide examinations.
The interval level of measurement, which has the properties of both the nominal and ordinal scales, is attained
when the values can describe the magnitude of the differences between groups or when the intervals between the
numbers are equal. “Equal interval” means that the distance between the things represented by 3 and 4 is the same
distance represented by 4 and 5. The most common example of interval scale is temperature readings. The difference
between the temperatures 30° and 40° is the same as that between 90° and 100° . However, there is no true zero point.
The zero degree in the Celsius thermometer does not mean zero or absence of heat, 0 ° is an arbitrary value, a
convenient starting point. With arbitrary zero point, there is a restriction with interval data. You cannot say that an 80 °
object is twice as hot as a 40° object. In the educational setting, a student who gets a score of 120 in a reading ability
test is not twice the better reader than one who got a score of 60 in the same test.
The highest level of measurement is the ratio scale. As such, it carries the properties of the nominal, ordinal, and
interval scales. Its additional advantage is the presence of a true zero point, where zero indicates the total absence of
the trait being measured. A 0 cm as a measure of width means no width, 0 km as a measure of distance means no
distance traveled, and 0 words spelled means no word was spelled at all. Test scores as measure of achievement in
many school subjects are often treated as interval scale. However, if achievement in a performance test in Physical
Education is measured by the number of “push-ups” one can do in a minute or distance run in an hour; or in a Typing
Class where you count the words typed in a minute or words spelled correctly, then these are all on ratio scale.
Now, the most likely questions that cross your mind are: which measure of central tendency should I use? Do I
have to use all the three since the statistical program can automatically give the three measures the easiest way?
Generally, the mean is the most used measure of central tendency because it is appropriate for interval and
ratio variables, which are higher levels of measurement. Its value is affected by the change of a single score such that it
is regarded as the most accurate measure to represent a set of scores. In research, this is most used specifically when
you want to make an inference about the population characteristics on the basis of the observed sample value.
For the median, in some cases, we could have one very high score (or very few high scores) and many low
scores. This is especially true when the test is difficult, or when students are not well prepared for the test. This will
result to many low scores and a few high scores that will lead to a positively-skewed distribution. In the same way, when
the test is too easy for students, there will be many high scores, which leads to a negatively-skewed distribution. In both
cases, the mean can give an erroneous impression of central tendency because its value is pulled by the extreme values
that reduces its role as the representative value of the set of scores. Hence, the median is a better measure. It is the
value that occupies the middle position among the ranked values; thus, it is less likely to be drawn toward the direction
of extreme scores. It is an ordinal statistic but can also be used for interval or ratio data distribution.
The mode is determined by the highest frequency of observations that makes it a nominal statistic.

How do measures of central tendency determine skewness?


The mean, median, and mode may further be compared if they have been calculated from the same frequency
distribution. In a perfectly symmetrical distribution, the mean, median, and mode have the same value, and the value of
the median is between the mean and the mode. This shape is illustrated in the figure below.
77.

Figure 8.1. Mean, and Mode in a Symmetrical Distribution

When the distribution becomes positively-skewed as shown in Figure 8.2, there are variations in their values. The mode
stays at the peak of the curve and its value will be the smallest. The mean will be pulled out from the peak of the
distribution toward the direction of the few high scores. Thus, the mean gets the largest value. The median is between
the mode and the mean.

Figure 8.2. Mean, Median, and Mode in a Positively-Skewed Distribution

Figure 8.3. Mean, Median, and Mode in a Negatively-Skewed Distribution

What are measures of dispersion?


One important descriptive statistic in the area of assessment is the measures of dispersion, which indicates
“variability,” “spread,” or “scatter.” See Figure 8.4.
You can see that different distributions may be symmetrical, may have same average values (mean, median,
mode), but how the scores in each distribution are spread out around these measures are different. In A, as shown in
Figure 8.4, scores range between 40 and 60; in B, between 30 and 70; and in C between about 20 and 80. Measures of
variability give us the estimate to determine how the scores are compressed, which contributes to the “flatness” or
“peakedness” of the distribution.
78

Figure 8.4. Measures of Variability of Sets of Test Scores

There are several indices of variability, and the most commonly used in the area of assessment are the following.

Range. It is the difference between the highest (X H) and the lowest (XL) scores in a distribution. It is the simplest measure
of variability but also considered as the least accurate measure of dispersion because its value is determined by just two
scores in a group. It does not take into consideration the spread of all scores; its value simply depends on the highest
and lowest scores. Its value could be drastically changed by a single value. Consider the following examples:
Determine the range for the following scores: 9, 9, 9, 12, 12, 13, 15, 15, 17, 17, 18, 18, 20, 20, 20.
Range = Highest Score (HS) – Lowest Score (LS)
= 20 – 9
= 11

Now, replace a high score in one of the scores, say, the last score and make it 50. The range becomes:
Range = HS – L
= 50 – 9
= 41
You will see that with just a single core, the range increased so high, which can be interpreted as large
dispersion of test scores; however, when you look at the individual scores, it is not.
Variance and Standard Deviation. Standard deviation is the most widely used measure of variability and is
considered as the most accurate to represent the deviations of individual scores from the mean values in the
distribution.

Examine the following test score distributions:


Class A Class B Class C
22 16 12
18 15 12
16 15 12
14 14 12
12 12 12
11 11 12
9 11 12
7 9 12
6 9 12
5 8 12
ƩX = 120 ƩX = 120 ƩX = 120
120 120 120
X̅ = X̅ = X̅ =
10 10 10
= 12 = 12 = 12
79.
Ʃ ( X – μ)²
σ2 =
N
where σ 2 = population variance
μ = population mean
X = score in the distribution
Finding the square gives us this formula for the standard deviation. That is,

where σ = population standard deviation


σ=
√ Ʃ( X – μ)²
N

μ = population mean
X = score in the distribution
If we are dealing with the sample data and wish to calculate an estimate of σ , the following formula is used for
such statistic:

where: s = standard deviation


s=
√ Ʃ( X – X )²
N –1

X = raw score
X̅ = mean score
N = number of scores in the distribution

This formula is what statisticians term as an “unbiased” estimate, and more often preferred considering that in
both research and assessment studies, we deal on sample data rather than actual population data.
With the standard deviation, you can also see the differences between two or more distributions.
Using the scores in Class A and Class B in the above dataset, we can apply the formula:
Class A Class B
X (X – X̅) (X – X̅)2 X (X – X̅) (X – X̅)2
22 22-12 100 16 16-12 16
18 18-12 36 15 15-12 9
16 16-12 16 15 15-12 9
14 14-12 4 14 14-12 4
12 12-12 0 12 12-12 0
11 11-12 1 11 11-12 1
9 9-12 9 11 11-12 1
7 7-12 25 9 9-12 9
6 6-12 36 9 9-12 9
5 5-12 49 8 8-12 16
X̅ = 12 Ʃ(X – X̅)2 = 276 X̅ = 12 Ʃ(X – X̅)2 = 74
The values 276 and 74 are the sum of the squared deviations of scores in Class A and Class B, respectively. If
these are divided by the number of scores in each class, this gives the variance (S2):
276 74
S²A = = 30.67 S²B = = 8.22
10 – 1 10 – 1

The values above are both in squared units, while our original units of scores are not in squared units. When we
find their square roots, we obtain values that are on the same scale of units as the original set of scores. These too give
the respective standard deviation (S) of each class and computed as follows:
SA = √ S ² A SA = √ S ² B
= √ 30.67 = √ 8.22
= 5.538 = 2.867
You may be thinking that the process will be difficult if you are dealing with many scores in a distribution. This is
not really a problem if you have a scientific calculator. With the simple steps indicated in the User’s Guide, you can just
80.
enter the scores and you will see the symbol σ n, which stands for the standard deviation. You will also see the symbol σ
n-1, which is used when dealing with sample scores. When the sample scores are few, σ n-1 is used; if you are taking the
population, σ n is applied. In the example earlier, we used only 10 scores to explain the concept of variance and standard
deviation. thus, we used n-1 as the denominator, taking into consideration that 10 examinees is the sample population
of students.
An alternative formula is what we call the raw score formula, although this formula does not reflect the concept
of “deviation” which connotes “difference”.
The mathematical equation is:


2
SD = ƩX ² – ( ƩX ) / N
N
where:
ƩX = sum of the squares of the raw scores
(ƩX)2 = square of the sum of all the raw scores
N = number of scores or examinees
Again, your scientific calculator can be used to find these values. You will see the fuctions ƩX2 and ƩX in your
calculator.
For larger number of scores in a distribution, Microsoft Excel or SPSS will be most efficient in obtaining both the variance
and standard deviation. This can be done in few seconds if you have already entered and saved the data used to get the
measure of central tendency. To illustrate the simple steps in using SPSS, we refer you to the scores for Class A given
earlier.
80.

Week 8

Name: __________________________________
Subject/Time: ___________________________
Instructor’s name: ________________________

 APPLY

1. Refer to this figure below as the frequency polygons representing entrance test scores of three groups of students in
different fields of specialization.
y –– Education –– business –– Engineering

frequency

0 55 65 75 85 95 105

Test score

a. What is the mean score of Education students?

b. What is the mean score of Engineering students?

c. What is the mean score of Business students?

d. Which group of students had the most dispersed scores in the test? Why do you say so?

e. What distribution is symmetrical? What distribution is skewed? Why do you say so?

You might also like