You are on page 1of 25

ASSESSMENT of LEARNING

MEASUREMENT
An educational process that checks the specificity of an individual which is expressed quantitatively.
The quantification of what students learned through the use of tests, questionnaires, rating scales, checklists, and other
devices.MEASUREMENT answers the question, how much does a student learn or know?

EVALUATION
An educational process that checks the personality of an individual which is expressed qualitatively.
A process of making judgements, assigning value, or deciding on the worth of student’s performance. EVALUATION answers the
question, how good, adequate, or desirable is it?

ASSESSMENT
The full range of information gathered and synthesized by teachers about their students and their classrooms. Gathered
through observation, verbal exchange, written reports, or outputs. ASSESSMENT looks into how much change has occurred on the
student’s acquisition of a skill, knowledge or value before and after a given learning experience.

PURPOSES OF M-E-A
- Appraisal of the school, curriculum, instructional materials, physical plant, equipment
- Appraisal of the teacher
- Appraisal of the school child

FUNCTIONS OF M-E-A
- Improvement of student learning
- Identification of students’ strengths and weaknesses
- Assessment of the effectiveness of a particular teaching strategy
- Appraisal of the effectiveness of the curriculum
- Assessment and improvement of teaching effectiveness
- Communication with and involvement of parents in their children’s learning

METHODS OF COLLECTING ASSESSMENT DATA


1. Paper-and-pen
Supply Type – requires the student to produce or construct an answer to the question
Selection Type – requires the student to choose the correct answer from a list of options
2. Observation
Involves watching the students as they perform certain learning tasks like reading, speaking....

TYPES OF EVALUATION
1. DIAGNOSTIC EVALUATION
Undertaken before instruction, in order to assess student’s prior knowledge of a particular topic or lesson. Done to determine strengths
and weaknesses of students as bases for remedial instruction.
2. FORMATIVE EVALUATION
Administered during the instructional process to provide feedback to students and learners on how well the former are learning the
lesson being taught. Frequently done to determine who have reached mastery of the lesson.
3. SUMMATIVE EVALUATION
Undertaken to determine student achievement for grading purposes. Usually done at the end of a unit, which summarizes the student’s
accomplishments.

APPROACHES TO EVALUATION
CRITERION-REFERENCED MEASURE (CRM)
- A student’s performance is compared against a predetermined or agreed upon standard.
- Designed to measure students’ performance with respect to some particular criterion or standard. It is used to evaluate
performance against performance objective.

NORM-REFERENCED MEASURE (NRM)


- A student’s performance is compared with the performance of other students.
- Designed to measure the ability of one student compared to the abilities of other students in the same class.

TYPES OF TESTS AND THEIR USES(Manarang, 1983; Louisell&Descamps, 1992)

1. Mode of Response
• Oral
• Written
• Performance

2. Ease of Quantification of Response


• Objective – with definite/ exact answer
• Subjective – divergent answers

3. Mode of Administration
• Individual – one student at a time
• Group – simultaneous

4. Test Constructor
• Standardized – prepared by an expert or specialist; follow uniform procedure
• Teacher-Made – prepared by classroom teacher with no established norm for scoring and interpretation

5. Mode of Interpreting Results


• Norm-Referenced – comparing performances of students
• Criterion-Referenced – comparing an individual performance with a specific goal

6. Nature of Answer
• Personality – emotion, social adjustment, dominance & submission, value orientation, disposition, emotional stability,
frustration level, degree of introversion or extroversion
• Intelligence – mental ability (I.Q)
• Aptitude – predicting the likelihood in a learning area
• Achievement – to determine what student has learned from formal instruction
• Accomplishment – to determine what students has learned form a broader area
• Socio-metric (Preference) – discovering learner’s likes and dislikes; social acceptance; social relationships
• Trade – to measure an individual’s skill or competence in an occupation or vocation
• Speed – to determine ability and accuracy bounded with time
• Diagnostic – to identify specific strengths and weaknesses in past and present learning
• Formative – to improve teaching and learning while it is going on
• Summative – given at the end of instruction to determine student’s learning and assign grades

TEST CONSTRUCTION: TEACHER MADE TESTS


Why Assess?
• To determine entry knowledge and skills
• To check status of learners’ learning
• To determine if targets are met

STEPS IN WRITING TEST ITEMS


1. Identification of instructional objectives and learning outcomes.
2. Listing the topics to be covered in the test.
3. Preparation of the Table of Specification (TOS).
4. Selection of the appropriate type/s of test.
5. Writing the test items.
6. Sequencing the test items.
7. Writing the directions or instructions.
8. Preparation of the answer sheet (if necessary) and scoring key.
9. Administering the test.
10. Analyzing the test results.
11. Interpreting the test results.

GENERAL GUIDELINES IN WRITING TEST ITEMS


• Avoid wording that is ambiguous and confusing.
• Use appropriate vocabulary and sentence structure.
• Keep questions short and to the point.
• Avoid using negative and double negative statements.
• Avoid using abbreviations/ acronyms especially if not used/presented in the class.
• There should be clear instruction.
• There should be specified number of points.
• There should be no patterns provided.
• There should be proper mechanical make-up.
• Do not provide clues/hints to the answer.
• Use vocabulary suited to the maturity of the students.
• Use language that even the poorest readers will understand.
• Items should not be directly lifted from book/reference.

TYPES OF TEACHER-MADE TEST


1. Objective
- Multiple Choices - Analogy
- Matching Type - Rearrangement
- Alternative Response - Identification
- Completion/Augmentation - Labeling
2. Subjective (Essay)
- Extended
- Restricted

MULTIPLE CHOICES
Stem – question or problem in each item; can be presented in 2 ways:
– Incomplete statement – all the options end with a period or only the last option ends with a period.
– Direct question – options do not end with a period but stem ends with a question mark.
Options - alternatives where student selects the correct answer
- there is only one correct/best answer from the options, the less appropriate are foils or distracters (maximum no. of options is
5 and the minimum is 4)

ADVANTAGES
• great versatility in measuring objectives - from the level of rote memorization to the most complex level
• the teacher can cover a substantial amount of course material in relatively short time
• scoring is objective
• teachers can construct options that require students to discriminate among them - vary in the degree of correctness
• effects of guessing are largely reduced since there are greater options
• items are more amenable to item analysis
DISADVANTAGES
• more time-consuming in terms of looking for options that are plausible
• more than one defensible correct answer

RULES
• essence of the problem should be in the stem; all options should measure the same objective
• when the incomplete statement format is used, the options should come at the end of the statement
• there should be coherence in stems and options
• there should be consistency in the length/presentation of choices
• avoid repetition of words in the options
• the choices should be arranged ascendingly/descendingly
• the choices should be arranged in vertical/columnar order
• stems and options should be stated positively whenever possible
• avoid negative statements or double-negative statements in the stem
• options should be plausible and homogeneous
• items should have defensible correct or best option
• vary the placement of correct options (to avoid pattern)
• avoid overlapping options
• options for complex type must be clear
• make sure there is only one correct/best answer to an item
• stem and options should be in a single page
• avoid using none of the above
• use none of the above option only if there is an absolute right answer
• avoid using all of the above
• It is a poor distracter since it has very little discriminating power to identify knowledgeable from
non-knowledgeable students.
• do not have combination of all of the above and none of the above in the options
• use four or five options
• there should be uniformity in the number of choices for all the items
• there should be no articles a/an at the end of the stem
• stem should be clear and grammatically correct and should contain elements common to each option (MC obey Standard
English rules of punctuation and grammar; a question requires a question mark)
TYPES OF ESSAY
1. EXTENDED RESPONSE QUESTIONS
- Leave students free to determine the content and to organize the format of their answer
- Opinionated or open-ended answers are solicited from students
2. RESTRICTED RESPONSE QUESTIONS
- Limit both the content and the format of the students’ answers
- Certain parameters are used in the questions/problems

ADVANTAGES OF ESSAY
• No guessing, assesses factual information
• Allows divergent thinkers to demonstrate higher order thinking skills (HOTS)
• Reduces lead time required to produce
• Less work to administer for smaller number of students
• Can be rich in diagnostic information

DISADVANTAGES OF ESSAY
• Subjectivity in scoring
• Even different times of day make a difference
• First paper to be read/checked often sets standard
• Time consuming in checking
• Can result to student rambling, confusion or inability to find a focus

HOW TO WRITE ESSAY QUESTIONS (Dr. Nacrim)


• Define the task clearly to the student
• When testing for content, make each item relatively short and increase the number of items
• Do NOT provide a choice of questions
• Devise answer key as you write question
• Give students the criteria for evaluating the answers
• Present material to get higher thinking skills

To be effective, essay questions need to….


• Be related to classroom and/or homework learning
• Be clearly articulated
• Be unambiguous
• Cover larger segments of material, rather than have a very limited scope
• Provide sufficient time for the quality of answers expected
• Require incorporation of factual knowledge
• Require students to provide reasoning for their answers
• Include clear directions as to length and structure

INCREASING OBJECTIVITY OF ESSAY SCORING


• Score blind
• Read one question at a time
• Halo effects
• Have a policy on irrelevant answers, errors

TEST INTERPRETATION

CENTRAL TENDENCY

MEAN
 Most common measure of central tendency
 Best for making predictions
 The best measure of central tendency if the distribution is normal
 The arithmetic average, computed simply by adding together all scores and dividing by the number of scores.
 It uses information from every single score.
MEAN (Ungrouped)
ΣX
X=
n
Example
If X = {3, 5, 10, 4, 3}
X = (3 + 5 + 10 + 4 + 3) / 5
= 25 / 5
= 5

MEDIAN
 Divides a distribution of scores exactly in half.
 The middle-most value.
 Better than mode because only one score can be median and the median will usually be around where most scores fall.
 If data are perfectly normal, the mode is the median.
 The median is computed when data are ordinal scale or when they are highly skewed.
Finding the Median
 First you rank order the values of X from low to high or vice versa
 Count number of observations and add 1
· Divide by 2 to get the middle score

MODE
 The most common observation in a group of scores.
 Distributions can be unimodal, bimodal, or multimodal.
 If the data is categorical (measured on the nominal scale) then only the mode can be calculated.
 The most frequently occurring score.

The Shape of Distributions


With perfectly bell shaped distributions, the mean, median, and mode are identical.

With positively skewed data, the mode is lowest, followed by the median and mean.

With negatively skewed data, the mean is lowest, followed by the median and mode.

Measure Advantages Disadvantages


Mean Best known average Affected by extreme values
(Sum of all values/ no. of values) Exactly calculable Can be absurd for discrete data
Make use of all data (e.g. Family size = 4.5 person)
Useful for statistical analysis Cannot be obtained graphically
Median (middle value) Not influenced by extreme Needs interpolation for group/
values aggregate data (cumulative
Obtainable even if data frequency curve)
distribution unknown (e.g. May not be characteristic of group
group/aggregate data) when: (1) items are only few; (2)
Unaffected by irregular class distribution irregular
width Very limited statistical use
Unaffected by open-ended class
Mode (most frequent value) Unaffected by extreme values Cannot be determined exactly in
Easy to obtain from histogram group data
Determinable from only values Very limited statistical use
near the modal class
Skewness
SKEWNESS
1. Positive skew
2. Normal distribution
3. Negative skew

KURTOSIS
1. Platykurtic
2. Mesokurtic
3. Leptokurtic

The Z statistic will allow you to standardize a normal distribution

General Rules of Item Difficulty…

 The item difficulty index (p) has a range of 0.00 to 1.00.


 If no one answers the item correctly, p = 0.00,
 an item that everyone answers correctly, p = 1.00
p low (< .20) difficult test item
p moderate (.21- .80) moderately diff.
p high (> .81) easy item

Guidelines for Interpretation of d Value

d≥.40 the item is functioning quite satisfactorily


.30≤ d≤.39 little or no revision is required
.20 ≤ d≤.29 the item is marginal and needs
revision
d≤.19 the item should be eliminated or
completely revised/rejected

The higher the value of d, the more effective the item is.
 When d is 1.00, all test takers in the upper group and no test takers in the lower group answered the item correctly.
 When d is -1.00, none of the upper group but all of the lower group answered an item correctly.
 GENERALLY, an item is considered acceptable if its d index is 0.30 or higher.

MEASURES of ATTRACTIVENESS

To measure the attractiveness of the incorrect option (distracters) in multiple-choice tests, we count the number of students who
selected the incorrect option in both the upper and lower groups. The incorrect option is said to be effective distracter if there are
more students in the lower group chose that incorrect option than those students in the upper group.

INTERPRETING TEST SCORES

Measures of Central Tendency


MEASURE CHARACTERISTICS WHEN TO USE

MEAN  The average of a group of scores.  The most reliable measure is


 Can be affected by extreme desired.
scores

MEDIAN  The midpoint or middle of a  The distribution of scores is


distribution of scores. skewed.
 50% of the scores fall above it
and 50 % fall below it
 50th percentile
 Not affected by extreme scores

Measures of Central Tendency

MEASURE CHARACTERISTICS WHEN TO USE

MODE  The score that occurs most  The quickest estimate of typical
frequently in the distribution. performance is wanted.

MEASURE OF VARIABILITY/ DISPERSION

- A numerical index that shows the extent to which the scores of a group scatter/ disperse/spread below and above a central
point in a distribution.
1) RANGE- the difference between the highest and the lowest values in a dataset.
2) INTER-QUARTILE RANGE- indicates the extent to which the central 50% of values within the data set are dispersed. It is
based upon and related to median.
3) STANDARD DEVIATION- summarizes the amount by which every value within a data set varies from the mean. It indicates
how tightly the values in the data set are bunched around the mean value.
MEASURE OF VARIABILITY/ DISPERSION

 The range is simple to compute and is useful when you wish to evaluate the whole of a data set. It is useful for showing the
spread within a dataset.
 The inter-quartile range provides a clearer picture of the over all data set as compared to range.
 The SD is the most widely used measure of dispersion, it takes into account EVERY VARIABLE in the dataset. It is usually
presented in conjunction with the mean.

PERCENTILES and PERCENTILE RANKS

-provide information about how a person or thing relates to a larger group.


PERCENTILES- used to determine where to draw the line between observed values within the distribution.
EX.: If a teacher wishes to determine the exam score that divides his class in half, with 50% scoring above and 50% scoring
below, he determines the point that marks the 50th percentile.
*Special names for various percentiles:
DECILES- ten equal sections.
QUARTILES- four equal sections.MEASURES OF POSITION.pptx

PERCENTILES and PERCENTILE RANKS

PERCENTILE RANK- used to determine where a particular score or value fits within a broader distribution.
- refers to the percentage of scores that are equal to or less than a given score.
EX. A percentile rank of 35 indicates that 35% of the scores in a distribution of scores fall at or below the score at the 35 th percentile.

SKEWNESS

is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The
skewness value can be positive or negative.

SKEWED DATA

Data can be "skewed", meaning it tends to have a long tail on one side or the other:

NEGATIVELY SKEWED NORMAL POSITIVELY SKEWED


SKEWED to the LEFT SKEWED to the RIGHT

Negative Skew/Skewed to the left

Why is it called negative skew? Because the long


"tail" is on the negative side of the peak.
People sometimes say it is "skewed to the left"
(the long tail is on the left hand side)
It indicates the presence of a high proportion of
NORMAL DISTRIBUTION

A Normal Distribution is not skewed.


It is perfectly symmetrical.
And the Mean is exactly at the peak.
Mean= Median= Mode

And positive skew is when the long tail is on the positive


side of the peak, and some people say it is "skewed to
the right". It indicates the presence of a small proportion
of relatively large extreme values, most of the scores of
the students are very low.
The mean is on the right of the peak value.
Mode< Median < Mean
KURTOSIS

Kurtosis- measure of the “ peakedness “of the probability distribution.


- essentially a property of symmetric distributions.
MESOKURTIC – distributions with zero kurtosis, example is the normal distribution.
LEPTOKURTIC- positive kurtosis, has a more acute peak around the mean and fatter tails, which generally means with the distribution
is more clustered around the mean, have a smaller SD.

KURTOSIS

PLATYKURTIC – negative kurtosis, has a lower, wider peak around the mean and thinner tails.

SOCIOMETRY

Sociometry is a quantitative method for measuring social relationships. Sociometric technique shows the interpersonal relationships
among the members of a group.
PERSONALITY ASSESSMENTS- used to determine one’s emotional adjustment, interpersonal relations, motivation, interest,
attitudes/feelings toward self, others and events or activities.
APTITUDE TEST- used to determine student beliefs, perceptions, or feelings; can be measured through use of a rating scale.

 Draw-a-Person Test-psychological projective personality or cognitive test used to evaluate children and adolescents for a
variety of purposes.
 Leiter International Performance Scale or simply Leiter scale is an intelligence test in the form of a strict performance
scale. It was designed for children and adolescents ages 2 to 18, [ although it can yield an intelligence quotient (IQ) and a
measure of logical ability for all ages.
 Raven's Progressive Matrices (often referred to simply as Raven's Matrices) or RPM is a nonverbal group test typically
used in educational settings. It is the most common and popular test administered to groups ranging from 5-year-olds to the
elderly
 GAT is a standardized assessment of an applicant's general reasoning ability, and it measures learning capacity,
observational skills and problem solving ability. It provides an objective measure that is free from language and cultural bias.
 The Otis–Lennon School Ability Test (OLSAT), a test of abstract thinking and reasoning ability of children pre-K to 18.

 Thematic Apperception Test (TAT) is a projective psychological test. Proponents of this technique assert that a person's
responses reveal underlying motives, concerns, and the way they see the social world through the stories they make up about
ambiguous pictures of people.
 Rorschach test ALso known as the Rorschach inkblot test, the Rorschach technique, or simply the inkblot test) is
a psychological test in which subjects' perceptions of inkblots are recorded and then analyzed
using psychological interpretation, complex algorithms, or both. Some psychologists use this test to examine a person's
personality characteristics and emotional functioning.

Psychometrics

The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of
psychological variables such as intelligence, aptitude, and personality traits.’

Psychometric Tests

1. Ability or Aptitude Tests

 Verbal

 Numerical

 Logical

2. Personality or Interest Inventories

Likert scale

-psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to
scaling responses in survey research, such that the term (or more accurately the Likert-type scale) is often used interchangeably with
rating scale,

- -respondents specify their level of agreement or disagreement on a symmetric agree-disagree scale for a series of statements.
Thus, the range captures the intensity of their feelings for a given item.
- -The format of a typical five-level Likert item, for example, could be:
- Strongly disagree
- Disagree
- Neither agree nor disagree
- Agree
- Strongly agree

Semantic differential

 is a type of a rating scale designed to measure the connotative meaning of objects, events, and concepts. The connotations
are used to derive the attitude towards the given object, event or concept.

The respondent is asked to choose where his or her position lies, on a scale between two bipolar adjectives (for example: "Adequate-
Inadequate", "Good-Evil" or "Valuable-Worthless"). Semantic differentials can be used to measure opinions, attitudes and values on a
psychometrically controlled scale.

Thurstone scale

 means of measuring attitudes towards religion. It is made up of statements about a particular issue, and each
statement has a numerical value indicating how favorable or unfavorable it is judged to be. People check each of the
statements to which they agree, and a mean score is computed, indicating their attitude.
What is grading system?

The K to 12 Basic Education Program uses a standards-and-competency-based grading system. These are found in the
curriculum guides.

All grades shall be based on the weighted raw score of the learners’ summative assessments.

The minimum grade needed to pass a specific learning area is 60, which is transmuted to 75 in the report card.

What is grading system?

The lowest mark that can appear on the report card is 60 for Quarterly Grades and Final Grades

What is grading system?

Components Languages/AP/ Science/Math MAPEH/TLE

EsP

1 to 10 Written Work 30% 40% 20%

Performance 50% 40% 60%

Tasks

Quarterly 20% 20% 20%

Assessment

How are learners promoted or retained at the end of the school year?

Requirements Decision

For Grades 4 to 10 1. Final Grade of at least 75 in all Promoted to the next grade level
learning areas
Learners 2. Did not meet expectations in not Must pass remedial classes for
more than two learning areas learning areas with failing mark to be
promoted to the next grade level.

Otherwise the learner is retained in the


same grade level.

Did not meet expectations in three or Retained in the same grade level
more learning areas

How are learners promoted or retained at the end of the school year?

For Grades 1 to 10, learner who Did Not Meet Expectations in at most two learning areas must take remedial classes.

Remedial classes are conducted after the Final Grades have been computed.

The learner must pass the remedial classes to be promoted to the next grade level.

How are learners promoted or retained at the end of the school year?

However, teachers should ensure that learners receive remediation when they earn raw scores which are consistently below
expectations in Written Work and Performance Tasks by the fifth week of any quarter.

This will prevent a student from failing in any learning area at the end of the year.

You might also like