You are on page 1of 8

ASESSMENT

PROFESSIONAL EDUCATION

Measurement and Evaluation

Competency: Apply Principles of Evaluation in Classroom Testing and Measurement

BASIC CONCEPTS
Test  An instrument designed to measure any quality, ability, skill or knowledge.
Comprised of test items of the area it is designed to measure.

Testing A strategy or a method employed to obtain information for evaluation purposes.

Measurement A broader term than test because there are other ways of measuring other than by
test (E.G. observation, use of checklists and rating scale.)
A process of quantifying the degree to which someone/something possesses a given trait,
quality, characteristic or feature.
Process of determining the degree and boundaries of specific traits and characteristics
being assessed.
Process of assigning numerical value to the trait or characteristic in question.
Aspect of evaluation that tells us “how much” or “how often”.

Assessment A broader term than measurement and it involves interpreting or placing such information
in context.
A process of gathering and organizing data into an interpretable form to have a basis for
decision-making.
It is a prerequisite to evaluation. It provides the information which enables evaluation to
take place.

Types of Assessment
1. Traditional Assessment – It refers to pen and paper mode of assessing any quality, ability, skill
or knowledge (E.G. Standardized and Teacher–Made tests, etc.)

2. Performance Assessment – It is a mode of assessment that requires the students to do


significant task that is relevant to school goals (E.G. Practical test, Oral and Aural test, Projects,
etc)

3. Portfolio Assessment – A process of gathering multiple indicators of student’s progress to


support course goals in dynamic, ongoing and collaborative process.

4. Authentic Assessment – A process of measuring important abilities using procedures that


stimulate the application of these abilities to real-life problems.

Evaluation A process of systematic collection and analysis of both qualitative and quantitative data in
order to make some judgment or decision.
It involves judgment about the desirability of changes in students as a result or
manifestation that learning has taken place.
Process of measuring a range of student attributes, abilities, and interests and of making
professional judgments based on the results of measurements.
Involves collecting data from a variety of sources, forming opinions and making
comparisons with which to guide students and others in educational and career decisions.
Process of summing up the results of measurements or tests and giving them some
meaning based on value judgment (Hopkins, 1981).

1
Purposes of Classroom Assessment
1. Assessment FOR Learning – this includes three types of assessment done before or during
instruction.
a. Placement – done prior to instruction
 Its purpose is to assess the needs of the learners to have basis in planning for a relevant
instruction
 Teachers use this assessment to know what their students are bringing into the learning
situation and use this as a starting point for instruction.
 The results of this assessment put students in specific learning groups to facilitate teaching
and learning.

b. Formative – done during instruction


 It is in this assessment where teachers continuously monitor the students’ level of
attainment of the learning objectives.
 The results of this assessment are communicated clearly and promptly to the students for
them to know their strengths and weaknesses and the progress of their learning.

c. Diagnostic – done during instruction


 This is used to determine students’ recurring or persistent difficulties.
 It searches for the underlying causes of students’ learning problems that do not respond to
first aid treatment.
 It helps in the formulation of a plan for detailed remedial instruction.

2. Assessment OF Learning – this is done after instruction. This is usually referred to as the
SUMMATIVE ASSESSMENT.
It is used to certify what students know and can do within a level of proficiency or competency.
Its results reveal whether or not instructions have successfully achieved the curricular outcomes.
The information from assessment of learning is usually expressed as marks or letter-grades.
The results of which are communicated to the students, parents, and other stakeholders for
decision making.
It is also a powerful factor that could pave the way for educational reforms.

3. Assessment AS Learning – this is done for teachers to understand and to perform well their role in
assessing FOR and OF learning. It requires teachers to undergo training on how to assess learning
and be equipped with competencies needed in performing their work as assessors.

TYPES OF TESTS
According to: Educational test – measures the Psychological test – measures the
results of instruction intangible aspects of an individual

Example:
Example: Aptitude test – measures the area
 what it Achievement test – measures where the students will likely succeed
measures what the students have achieved
at the end of the instruction Personality test – measures the
students’ personality traits

Intelligence test – measures the


students’ mental ability/capacity
 how it is Norm-referenced test – a student Criterion-referenced test – a student is
interpreted is compared to other students. compared against a set of criteria
Survey – measures a broad Mastery test – measures specific
 its scope
range of objectives learning objectives
 its level of Power – items are of increasing
Speed – items are of the same level of
difficulty and level of difficulty but taken with
difficulty but taken with limited time.
time allotment ample time
2
Individual – one at a time. Group – many individuals at the same
 how it is given
one after the other time.
 its language Verbal – uses words in written or
Non-verbal – uses pictures or symbols
mode oral form
 who Standardized – made by an
Informal – made by a classroom
constructed it expert, tried out, so it can be
teacher, not tried out, so it can be used
and who can used to a wider group
only by his/her own students
take it
 the degree of
Objective – unaffected by
influence of Subjective – affected by personal
personal biases
the rater on biases of the one doing the judgment.
the outcome

Other Objective Instruments:


Diagnostic test. It measures student’s strengths and weaknesses in a specific area of study.
Formative test. It measures student’s progress that occurs over a short period of time.
Summative test. It measures the extent to which the students have attained the desired outcomes for
a given chapter.
Placement test. It measures the grade or year level where the student should enroll after ceasing
from school.
Prognostic test. It predicts the student’s future achievement in a specific subject area.
Preference test. It measures both interest and aesthetic judgment by requiring the students to make
forced choices between members of paired or grouped items.
Accomplishment test. It measures individual student’s achievement in the school curriculum.
Omnibus test. It measures a variety of mental operations combined into a single sequence from which only
a single score is taken.
Readiness test. It measures the extent to which an individual has achieved certain skills needed for
beginning some new learning activities.

Types of Tests According to Format


1. Selective Test – provides choices for the answer.
a. Multiple-Choice – consists of a stem which describes the problem and three or more
alternatives which give the suggested solutions. One of the alternatives is the correct answer
while the other alternatives are the distracters.

b. Alternative Response – consists of declarative statements that one has to mark true or false,
right or wrong, correct or incorrect, yes or no, fact or opinion, and the like.

c. Matching Type – consists of two or more parallel columns: Column A, the column of premises
from which a match is sought; Column B, column C, the columns of responses by which the
selection is made.

2. Supply Test
a. Short Answer – uses a direct question that can be answered by a word, a phrase, a number, or
a symbol.
b. Completion Test – consists of an incomplete statement.

3. Essay Test
a. Restricted Response – limits the content of the response by restricting the scope of the topic.
b. Extended Response – allows the students to select any factual information that they think is
pertinent and to organize their answers in accordance with their best judgment.

3
PERFORMANCE-BASED ASSESSMENT
Performance-based assessment is a process of gathering information about student’s learning
through actual demonstration of essential and observable skills and creation of products that are grounded
in real world contexts and constraints. It is an assessment that is open to many possible answers and
judged using multiple criteria or standards of excellence that are pre-specified and public.

Seven Criteria in Selecting a Good Performance Assessment Task


 Authenticity – the task is similar to what the students might encounter in the real world as
opposed to encountering only in the school.
 Feasibility – the task is realistically implementable in relation to its cost, space, time, and
equipment requirements
 Generalizability – the likelihood that the students’ performance on the task will generalize to
comparable tasks.
 Fairness – the task is fair to all the students regardless of their social status or gender
 Teachability – the task allows one to master the skill that one should be proficient in.
 Multi Foci – the task measures multiple instructional outcomes.
 Scorability – the task can be reliably and accurately evaluated

1. Develop a scoring rubric reflecting the criteria, levels of performance and the scores.

PORTFOLIO ASSESSMENT

Portfolio Assessment is also an alternative tool to pen-and-paper objective test. It is a


purposeful, ongoing, dynamic, and collaborative process of gathering multiple indicators of the learner’s
growth and development. Portfolio assessment is also performance-based.

Principles Underlying Portfolio Assessment


1. Content principle suggests that portfolios should reflect the subject matter that is important for the
students to learn.
2. Learning principle suggests that portfolios should enable the students to become active and
thoughtful learners.
3. Equity principle explains that portfolios should allow students to demonstrate their learning style
and multiple intelligences.

Types of Portfolios
1. The working portfolio is a collection of a student’s day-to-day works which reflect his/her learning.
2. The show portfolio is a collection of a student’s best works.
3. The documentary portfolio is a combination of a working and a show portfolio.

DEVELOPING RUBRICS
Rubric is a measuring instrument used in rating performance-based tasks. It is the “key to
corrections” for assessment tasks designed to measure the attainment of learning competencies that
require demonstration of skills or creation of products of learning. It offers a set of guidelines or
descriptions in scoring different levels of performance or qualities of products of learning. It can be used in
scoring both the process and the products of learning.

Similarity of Rubric with Other Scoring Instruments

Rubric is a modified checklist and rating scale


1. Checklist
 Presents the observed characteristics of a desirable performance or product
 The rater checks the trait/s that has/have been observed in one’s performance or product

2. Rating Scale

4
 Measures the extent or degree to which a trait has been satisfied by one’s work or
performance
 Offers an overall description of the different levels of quality of a work or a performance
 Uses 3 or more levels to describe the work or performance although the most common
rating scales have 4 or 5 performance levels.

Below is a Venn Diagram that shows the graphical comparison of rubric, rating scale and
checklist.

R
U - shows
- shows the
Checklis observed traits B
degree of Rating
quality of
of a work / R Sale
work/
t performance I
performance
C

Types of Rubrics

Type Description Advantages Disadvantages


It describes the overall  It allows fast assessment.  It does not clearly
quality performance or  It provides one score to describe the degree
product. In this rubric, describe the overall of the criterion
Holistic there is only one rating performance or quality of satisfied or not by the
Rubric given to the entire work work. performance or
or performance  It can indicate the general product.
strength and weaknesses  It does not permit
of the work or differential weighting
performance. of the qualities of a
product or a
performance.
It describes the quality  It clearly describes the  It is more time
of a performance or degree of the criterion consuming to use.
product in terms of satisfied or not by the  It is more difficult to
Analytic identified dimensions performance or product. construct.
Rubric and / or criteria for  It permits differential
which are rated weighting of the qualities
independently to give a of a product or a
better picture of the performance.
quality of work  It helps raters pinpoint
performance. specific areas of strength
and weaknesses.

Essential Characteristics of Good Measuring Instruments


1. VALIDITY
Validity refers to the degree to which a test measures what it intends to measure. It is the
usefulness of the test for a given measure. A valid test is always reliable.

5
1.1. Rational Validity – This depends upon professional judgment alone; by judgment of competent
teachers, usually three or more experts in the field.
1.1.1. Content / Curricular Validity – Validity established by comparison of the content of the
test with a particular type of curriculum, textbook, course of the study or outline.
Example:
A teacher made a test in biology. Her test has curricular validity if the content is biology and
not history or geography.
1.1.2. Concept / Construct Validity – Validity established by analyzing the activities and
processes that correspond to a particular concept.
Example:
Analysis of the scientific method, of critical thinking, and efficient skill in writing.
1.2. Statistical / Empirical / Criterion-Related Validity – Validity established by correlating the
results of test with an outside criterion or against an outside valid criterion.
1.2.1. Congruent Validity – Validity which is established when a test is correlated whit an
existing measure which has a similar function.
Example:
A group of intelligence test is valid if it correlates reasonably with another intelligence test
with known high validity, such as the Otis intelligence test.

1.2.2. Concurrent Validity – Validity established by correlating the test with some other
measures which is obtained at the same time.
Example:
Relate the reading test result with pupils’ average grades in reading given by the teacher.

1.2.3. Predictive Validity – Validity established by correlating the test with another measure
which can foretell later success in school, in one’s job or on life.
Example:
The entrance examination scores in a test administered to a freshman class at the beginning
of the school year is correlated with the average grades at the end of the school year.

1.3. Logical and Psychological Validity – Validity is established through subjective analysis of the
test by experts in the field. This is usually done if the test cannot be statistically measured.
Example: Artistic works.
2. RELIABILITY
Reliability refers to consistency and accuracy of test results, the degree to which two or more forms
of the test will yield the same results under uniform conditions.
Increasing the length of the test may raise the reliability of the test. Clear and concise directions
would also increase the reliability of the test.

Methods of Determining Reliability

Methods Types of Procedures Statistical


Reliability Measure
Measure
1. Test-Retest Measure of Give a test twice to the same group Pearson r
Stability with any time interval between tests
from several minutes to several years
2. Equivalent Measure of Give parallel forms of tests to the Pearson r
Forms Equivalence same group with close time interval
between forms
3. Test-Retest Measure of Give parallel forms of tests to the Pearson r
with Equivalent Stability and same group with increased time
Forms Equivalence interval between forms
4. Split Half Measure of Give a test once. Score equivalent Pearson r and

6
Internal halves of the test. Spearman
Consistency Brown Formula
5. Kuder- Measure of Give the test once then correlate the Kuder-
Richardson Internal proportion/percentage of the students Richardson
Consistency passing and not passing a given item Formula 20
and 21
3. OBJECTIVITY
The degree to which no personal judgment, opinion, or bias will affect the scoring of the test. This
can be secured by wording the statements of items in the test in such a way that only one answer is
possible.
* A test should be such that different teachers can similarly score the test and arrive at the same
scores. In other words, the more objective the test is, the greater is its reliability.

4. USABILITY / PRATICABILITY
The degree to which a test can be used by teachers and administrators without unnecessary waste
of time, money, and effort.
It involves the following factors:

Specific Suggestions

A. Supply Types of Tests


1. Word the item/s so that the required answer is both brief and specific.
2. Do not take statements directly from textbooks as a basis for short answer items.
3. A direct question is generally more desirable than an incomplete statement.
4. If the item is to be expressed in numerical units, indicate the type of answer wanted.
5. Blanks for answers should be equal in length.
6. Answers should be written before the item number for easy checking.
7. When completion items are to be used, do not have too many blanks. Blanks should be at the
center or at the end of the sentences and not at the beginning.

B. Selective Type of Tests


1. Alternative – Response
a. Avoid broad statements.
b. Avoid trivial statements.
c. Avoid the use of negative statements especially double negatives.
d. Avoid long and complex sentences.
e. Avoid including two ideas in one statement unless cause-effect relationships are being
measured,
f. If opinion is used, attribute it to some source unless the ability to identify opinion is
being specifically measured.
g. The number of true statements and false statements should be approximately equal.
h. Start with a false statement since it is common observation that the first statement in
this type of test is always positive.

2. Matching Type
a. Use only homogenous material in a single matching exercise.
b. Include an unequal number of responses and premises, and instruct the pupils that
responses may be used once, more than once or not at all.
c. Keep the list of items to be matched brief, and place the shorter responses at the right.
d. Arrange the list of responses in logical order.
e. Indicate in the directions the basis for matching the responses and premises.
7
f. Place all the items for one matching exercises on the same page.

3. Multiple- Choice
a. The stem of the item should be meaningful by itself and should present a definite
problem.
b. The stem should be free from irrelevant material.
c. Use a negatively stated stem only when significant learning outcomes require it.
d. Highlight negative words in the stem for emphasis.
e. All the alternatives should be grammatically consistent with the stem of the item.
f. An item should only have correct or clearly best answer.
g. Items used to measure understanding should contain novelty, but beware of too much.
h. All distracters should be plausible.
i. Verbal associations between the stem and the correct answer should be avoided.
j. The relative length of the alternatives should not provide a clue to the answer.
k. The alternatives should be arranged logically.
l. The correct answer should appear in alternate positions and approximately equal
number of times but in random order.
m. Use of special alternatives such as “none of the above” or “all of the above” should be
done sparingly.
n. Do not use multiple choice items when other types are more appropriate.
o. Always have the stem and alternatives on the same page.
p. Break any of these rules when you have a good reason for doing so.

4. Essay Type of Tests


a. Restrict the use of essay questions to those learning outcomes that cannot be
satisfactory measured by objective items.
b. Formulate questions that will call for the behavior specified in the learning outcomes.
c. Avoid the use of optional questions.
d. Indicate the approximate time limit or the number of points for each question.

You might also like