You are on page 1of 10

PADLA, CHRISTINE MARIE B.

EDU 533 (A-COE- 12)

OUTLINE

MODULE 18: ORGANIZATION OF DATA USING DRAFTS

X=test scores (write all scores)

f=frequency (count the number of students with the same X)

Percent= divide the frequency by number of examinees multiplied by 100 (F/100*100)

cumulative percent= add % to frequency of the next score

Conventions in presenting test data grouped in frequency:

1. As much as possible, the size of the class intervals should be equal. Class intervals that are multiples of
5, 10, 100 etc are often desirable. At times, when large gaps exist in the data and unequal class intervals
are used, such intervals cause inconvenience in the preparation of graphs and computation of certain
descriptive statistical measures. Use this formula in estimating necessary class intervals:

i=H-L

where:

i= size of the class intervals

H= highest test score

L=lowest test score

C=number of classes

The conventional number of classes to group the date generally varies from 7-20. As seen on table 3, the
size of the class interval is 5 which is an odd number. If you look at the midpoints, these are all whole
numbers. If class size is an even number, then the midpoints will contain decimal numbers which may
add some difficulties in conventional computations for some important measures.

2. Start the class interval at a value which is a multiple of the class width. In table 3, you used the class
interval of 5 such that you start with the class value of 20, which is a multiple of 5 and where 20-24
includes the lowest test score of 21.as seen on table 1.

3. As much as possible, open-ended class intervals should be avoided (ex. 100 and below or 100 and
above) These will cause some problems in graphing and computation of descriptive statistical measures
MODULE 19: Part 2: Organization of Data Using Graphs

The most common type of graph used to evaluate behavioral data is the line graph. A line graph shows
individual data points connected by line, creating a path. Over time, this path can show a visual pattern
that helps you evaluate the overall directions of a behavior.

Another common graph used is referred to as a bar graph. A bar graph is often used when portions of a
whole are being represented or when reporting a percentage. The bar graph focuses on the height of
the data rather than the trend in the data, and is most often used when non-consecutive data points are
being evaluated. This is a particularly useful method when comparing information across individuals,
settings, or situations.

 Histogram is a type of graph appropriate for quantitative data such as test scores. This graph
consists of column-each has a base that represents one class interval, and its height represents
the number of observations or simply the frequency in the class interval.
 Frequency polygon is also used for quantitative data, and it is one of the most commonly used
methods in presenting test scores. It is the line graph of a frequency polygon. Very similar with
histogram but instead of bars it uses lines to compare sets of test data in the same axes.
 Cumulative frequency polygon is quite different from a frequency polygon because cumulative
frequencies are plotted. In addition, you plot a point above the exact limits of the interval. As
such, a cumulative polygon gives a picture of the number of observations that fall below a
certain score instead of the frequency within a class interval.
 Pie graph may be useful when representing portions of a whole. For instance, it might be helpful
to create a pie chart indicating the amount of time a student spends actively engaged in
activities.
 Skewness is the degree of asymmetry of a graph. Basic principle of a coordinate system tells you
that as you move toward the right of the x-axis, the numerical value increases. Likewise, as you
move up the y-axis, the scale value becomes higher. Thus, in a negatively skewed distribution,
there are more who get higher scores and the tail, indicating lower frequencies of distribution
points to the left or to the lower scores, On the other hand, in positively skewed distribution,
lower scores are clustered on the left side. This means that there are more who get lower scores
and the tail indicates the lower frequencies are on the right or the higher scores.
 Kurtosis is a statistical measure used to describe the degree to which scores cluster in the tails
or the peak of a frequency distribution. The peak is the tallest part of the distribution, and the
tails are the ends of the distribution.

THREE TYPES:

 MESOKURTIC, LEPTOKURTIC, PLATYKURTIC

MODULE 20: Analysis, Interpretation and use of Test Data

Measures of Central Tendency provide a summary measure that attempts to describe a whole set of
data with a single value that represents the middle or center of its distribution. There are three main
measures of central tendency: the mean, the median and the mode.
When data is normally distributed, the mean, median and mode should be identical, and are all effective
in showing the most typical value of a data set. It's important to look at the dispersion of a data set
when interpreting the measures of central tendency.

 The mean of a data set is also known as the average value. It is calculated by dividing the sum of
all values in a data set by the number of values.
 The median of a data set is the value that is at the middle of a data set arranged from smallest
to largest. The median is appropriate to use with ordinal variables, and with interval variables
with a skewed distribution.
 The mode is the most common observation of a data set, or the value in the data set that occurs
most frequently. The mode is an appropriate measure to use with categorical data. The mode
has several disadvantages. It is possible for two modes to appear in the one data set (e.g. in: 1,
2, 2, 3, 4, 5, 5, both 2 and 5 are the modes).
 Range is the difference between the highest (XH) and the lowest (XL) scores in a distribution.
The simplest measure of variability but also considered as the least accurate measure of
dispersion because its value is determined by just two scores in a group. It does not take into
consideration the spread of all scores; its value simply depends on the highest and lowest
scores. Its value could be drastically changed by a single value.
 Standard deviation is the most widely used measure of variability and is considered as the most
accurate to represent the deviations of individual scores from the mean values in the
distribution. Standard deviation in R is a statistic that measures the amount of dispersion or
variation of a set of value, generally, it is used when we are dealing with values where we have
to find the difference between the values and the mean.

Where,

S: Sample standard deviation.

N: Number of observations.

xi: Observed value of the sample item.

x̅: Mean value of the observation.

MODULE 21: Part 2 Analysis, Interpretation and use of test data

 Quartile. In measures of central tendency, you learned that the median of a distribution divides
the date into two equal groups. In a similar way, the quartiles are the three values that divide a
set of scores into four equal parts, with one-fourth of the data values in each part. This means
about 25% of the data falls at or below the first quartile (Q1); 50% of the data falls at or below
the 2nd quartile (Q2), and 75% falls at or below the 3rd quartile (Q3) Notice that Q2 is also the
median. We can say that Q1 is the median of the first half of the values and Q3 the median of
the second half of the values. Thus, the upper quartile represents on average the mark of the
top half of the class, while the lower quartile represents that bottom half of the class.
 Decile. It divides the distribution into 10 equal parts. There are 9 deciles such that 10% of the
distribution are equal or less than decile 1. 20% of the scores are equal or less than decile 2 and
so on. As student whose mark is below the first decile is said to belong to decile 1. A student
whose mark is between the first and second deciles is in decile 2, and one whose mark is above
the ninth decile belongs to decile 10. If there are a small number of data values, decile is not
appropriate.
 Percentiles indicate the percentage of scores that fall below a particular value. They tell you
where a score stands relative to other scores. For example, a person with an IQ of 120 is at the
91st percentile, which indicates that their IQ is higher than 91 percent of other scores.
Percentiles are a great tool to use when you need to know the relative standing of a value.
 The normal distribution is the most important probability distribution in statistics because it fits
many natural phenomena. The normal distribution is a probability function that describes how
the values of a variable are distributed. It is a symmetric distribution where most of the
observations cluster around the central peak and the probabilities for values further away from
the mean taper off equally in both directions. Extreme values in both tails of the distribution are
similarly unlikely.
 Measures of Covariability tell you to a certain extent a relationship between two tests or two
factors. Admittedly, a score one gets may not only be due to a single factor but with other
factors directly or indirectly observable, which are also related to one another.

MODULE 22: Part 1: Grading and Reporting of Test Results Purpose of Grading and Reporting,
Methods in Scoring Performance Tasks

Grades are alphabetical or numerical symbols/ marks that indicate the degree to which learners are able
to achieve their learning objectives. They are part of the instructional process and serve as feedback on
what specific topic/s learners have mastered and what they need to focus more when they need to
prepare for summative assessments. Sometimes, grades may serve as motivators to some learners to
maintain or improve their performance. They give parents information about their children’s
achievements. They are also useful for administrators who want to evaluate the effectiveness of the
instructional programs in developing the needed skills and competencies of the learners.

Traditional Methods in scoring performance tasks

1. Number right scoring (NR) entails assigning positive values only to correct answers while giving a
score of zero to incorrect answers. The test score is the sum of the scores for correct responses. One
major concern with this scoring method is that learners may get the correct answer by guessing;
affecting the test reliability and validity.

2. Negative marking (NM) entails assigning positive values to correct answers while punishing the
learners for incorrect responses (right minus wrong method) In this model, a fraction of the number of
wrong answers is subtracted from the number of correct answers. Other models for this type of scoring
includes: a. giving positive score to correct answer while assigning no mark for omitted items b.
rewarding learners for not guessing by awarding point rather than penalizing for incorrect answers 1/(n
− 1) where n stands for the number of choices.

Non Conventional Methods in scoring performances


1. Partial Credit scoring methods attempt to determine a learner’s degree of level of knowledge with
respect to each response option given. This method of scoring takes into account partial knowledge
mastery of learners. It acknowledges that, while others cannot always recognize the correct answer,
they can discern that some response options are clearly incorrect.

a. Liberal Choice test- allows learners to select more than one answer to a question if they feel
uncertain which option or alternative is correct

b. Elimination testing (ET) - instructs learners to cross out all alternatives they consider to be incorrect

c. Confidence Weighing (CW)- asks learners to indicate what they believe is the correct answer and how
confident they are about their choice.

2. Retrospective Correcting for Guessing considers omitted or no-answer items as incorrect, forcing
learners to give an answer for every item even if they do not know the answer. The correction for
guessing is implemented later or retroactively. This can be done through comparing learner’s answers in
multiple-choice items with their answer on the other test formats such as short-answer test.

3. Standard-setting entails using standards when scoring multiple-choice items particularly standards set
through norm-referenced or criterion-referenced assessments. Standards based on norm-referenced
assessments are derived from the test performance of a certain group of learners, while standards from
criterion-referenced assessment are based on preset standards specified from the very start by the
teacher or school in general.

4. Holistic Scoring involves giving a single, overall assessment score for an essay, writing composition, or
other performance-type assessment as a whole. Although the scoring rubric for holistic scoring lays out
specific criteria for evaluating a task, raters do not assign a score for each criterion. Instead, as they read
a writing task or observe a performance task, they balance strengths and weaknesses among the various
criteria to arrive at an overall assessment. Holistic scoring is considered efficient in terms of time and
cost. It also does not penalize poor performance based on only one aspect (eg. content, delivery,
organization) However, it is said that holistic scoring does not provide sufficient diagnostic information
about the students’ ability as it does not identify the areas for improvement and is difficult to interpret
as it does not detail the basis for evaluation.

5. Analytic Scoring involves assessing each aspect of a performance task and assigning a score for each
criterion. Sometimes, an overall score is given by averaging the scores in all criteria. One advantage of
analytic scoring is its reliability. It also provides information that can be used as diagnostic as it presents
learners’ strengths and weaknesses and in what area/s and eventually as basis for remedial instructions.
However it is more time consuming and therefore expensive. It is also prone to halo effect, wherein
scores in one scale may influence the ratings of the others. It is also difficult to create.

6. Primary Trait Scoring focuses on only one aspect or criterion of a task, and a learner’s performance is
evaluated based on a trait. This scoring system defines a primary trait in the task that will then be
scored. For example if a teacher in a political science class asks his students to write an essay on the
advantages and disadvantages of Martial Law, the basic question addressed in scoring is, “Did the writer
successfully accomplish the purpose of this task?” With this focus, the teacher would ignore errors in
conventions of written language but instead focus on the overall rhetorical effectiveness. One
disadvantage on this scoring scheme is that it is often difficult to focus exclusively on one trait, such that
other traits may be included when scoring. Thus, it is important that a very detailed scoring guide is used
for each specific task.

7. Multi-trait scoring requires that an essay test or performance task is scored on more than one aspect,
with scoring criteria in place so that they are consistent with the prompt. Multiple-trait scoring is task-
specific, and the features to be scored vary from task to task; thus requiring separate scores for different
criteria. Multiple-trait scoring is similar to analytic scoring because of its focus on several categories of
criteria. However, while analytical scoring evaluates more traditional and generic dimensions of
language production, multiple-trait scoring focuses on specific features of performance required to fulfill
the given task or tasks. For example in a PE class, basketball, one may be scored based on different skills
such as dribbling, passing, rebound, blocking, stealing, etc.

MODULE 23: Part 2: Grading and Reporting Test Results Types of Test Scores Guidelines in Test
Grading or Performance Tasks

Types of Test Scores

1. Raw Score is simply the number of items answered correctly in a test. A raw score provides an
indication of the variability in the performance of students in the class. However, a raw score hasno
meaning unless you know what the test is measuring and how many items it contains. A raw score also
does not mean much because it cannot be compared with a standard or with the performance of
another learner or of the class as a whole.

2. Percentage Score refers to the percent of items answered correctly in a test. The number of items
answered correctly is typically converted to percent based on the total possible score. The percentage
score is interpreted as the percent content, skills or knowledge that the learner has a solid grasp of. Just
like raw score, percentage score has limitations because there is no way of comparing the percentage
correct obtained in a test with the percentage correct in another test with a different difficulty level.
Percentage score is most appropriate to use in a teacher-made test or criterion-referenced test.
Percentage score is appropriate to use in a teacher-made test that is administered commonly to a class
or to students taking the same course with the same course syllabus. In this way, the students’ test
performances can be compared among each other in the class or with their peers in another section. In
the same manner, percentage score is suitable to use in subjects where a standard score has been set.

3. Criterion-referenced grading system is a grading system wherein learner’s test scores or achievement
levea.

a. Pass or fail grade is most appropriate if the test or assessment is primarily or entirely to make a pass
or fail decision. In this type of scoring, a standard or cutoff score is preset, and a learner is given a score
of pass if he or she surpassed the expected level of performance or cutoff score. This is most
appropriate for comprehensive or licensure exams because there is no limit to the number of examinees
who can pass or fail. Each individual examinee’s performance is compared to an absolute standard and
not to the performance of others. Advantages:

i. takes pressure off the learners in getting a high numerical score

ii. gives learners a clear cut idea of their strengths and weaknesses
iii. allows learners to focus on true understanding or learning of the course content rather than on
specific details that will help them receive a high letter or numerical score

b. Letter grade is one of the most commonly used grading systems. Letter grades are usually composed
of a five-level grading scale labeled from A to E or F with A representing the highest level of
performance.ls are based on their performance in specific learning goals and outcomes for standards.

c. plus (+) and minus (-) letter grades provide a more detailed descriptions of the level of learners’
achievement or task performance by dividing each grade category into three levels such that a grade of
A can be assigned as A+ or A- and so on.

d. Categorical Grades is generally more descriptive than letter grades, especially if couples with verbal
labels. Verbal labels eliminate the need for a key or legend to explain what each grade category means

4. Norm-referenced grading system compared learners’ test scores with their peers’ test scores. This
involves ranking to express the learner’s score in relation to the achievement of the group. This allows
teachers to:

a. compare learners’ test performance with that of other students

b. compare learners’ performance in one test with another test (subtest);

c. compare learners; performance in one form of the test with another form of the test submitted at an
earlier dates

Types:

a. Developmental Score are scores transformed from raw scores and reflect the average performance at
age and grade levels

i. grade-equivalent score is described as both a growth score and status score. The grade equivalent
score of a given raw score in any test indicates the grade level atwhich the typical learner earns his raw
score. A decimal point is used between a trade and month in grade equivalence. Ex. a score of 6.5 means
that the learner did as well as a grade 6 taking the test at the end of the fifth month of the school year

ii. age-equivalent score indicated the age level that is typical to a learner to obtain such raw score, It
reflects a learner’s performance in terms of the chronological age as compared to those in the norm
group. These scores are written with a hyphen between the years and months. If a learners score is 12-
3, his age equivalence is 12 years and 3 months old, indicating a test performance that is similar to that
of a 12.3 year-olds in a group.

b. Percentile rank indicates the percentage of scores that fall at or below a given score. Percentile ranks
range from 1-99. For example, if a student obtained a score of 85 th percentile rank in a standardized
achievement test, it means that the learner was able to get a higher score than 85% of the learners in
the norm group.

c. Stanine Score expresses test results in nine equal steps which range from one (lowest) to nine
(highest) A stanine score of 5 is interpreted as average stanine. Percentile ranks are grouped into
stanines with the following interpretations”:
d. Standard Scores discussed in the previous modules

i. Z-score

ii. T-score

Guideline in grading tests/ performance tasks

1. Stick to the purpose of the assessment.

2. Be guided by the desired learning outcomes

3. Develop grating criteria

4. Inform learners what scoring methods are to be used

5. Decide on what type of test scores to use

Guidelines in grading essay tests

1. Identify the criteria for writing essay

2. Determine the type of rubric to be used

3. Prepare the rubric

4. Evaluate the essay anonymously

5. Score one essay at a time

6. Be conscious of your own biases when evaluating paper

7. Review initial scores and comments before giving the final rating

8. Get two or more raters

9. Write comments (feedback)

MODULE 24: Grading System of the K-12 Program

The K to 12 Basic Education Program uses a standards- and competency-based grading system. These
are found in the curriculum guides. All grades will be based on the weighted raw score of the learners’
summative assessments. The minimum grade needed to pass a specific learning area is 60, which is
transmuted to 75 in the report card. The lowest mark that can appear on the report card is 60 for
Quarterly Grades and Final Grades. For these guidelines, the Department will use a floor grade
considered as the lowest possible grade that will appear in a learner’s report card. Learners from Grades
1 to 12 are graded on Written Work, Performance Tasks, and Quarterly Assessment every quarter. These
three are given specific percentage weights that vary according to the nature of the learning area.

Guidelines specific to the assessment of Kindergarten learners will be issued in a different memorandum
or order. However, for Kindergarten, checklists and anecdotal records are used instead of numerical
grades. These are based on learning standards found in the Kindergarten curriculum guide. It is
important for teachers to keep a portfolio, which is a record or compilation of the learner’s output, such
as writing samples, accomplished activity sheets, and artwork. The portfolio can provide concrete
evidence of how much or how well the learner is able to accomplish the skills and competencies.
Through checklists, the teacher will be able to indicate whether or not the child is able to demonstrate
knowledge and/or perform the tasks expected of Kindergarten learners. Through anecdotal records or
narrative reports, teachers will be able to describe learners’ behavior, attitude, and effort in school
work.

For grades 1-12, in a grading period, there is one Quarterly Assessment but there should be instances for
students to produce Written Work and to demonstrate what they know and can do through
Performance Tasks. There is no required number of Written Work and Performance Tasks, but these
must be spread out over the quarter and used to assess learners’ skills after each unit has been taught.

For MAPEH, individual grades are given to each area (IE, MUsic, Art, PE and Health) The quarterly grade
for MAPEH is the average grade across the four areas:

QG for MAPEH= (QG for Music + QG for Arts + Quarter Grade for PE + Quarter Grade for Health) ÷ 4

The final grade for each subject is then computed by getting the average of the four quarterly grades, as
seen below:

Final Grade for each learning area= (1QG + 2QG + 3QG + 4QG) ÷ 4

The General Grade on the other hand, is computed by getting the average of the Final grades for all
subjectareas. Each subject area has equal weight.

General Average= sum of all learning areas ÷ total number of learning areas in a grade level

All grades reflected on the report card are reported as a whole number.

MODULE 25: Communicating Test Results to Stakeholders

In education, the term stakeholder typically refers to anyone who is invested in the welfare and success
of a school and its students, including administrators, teachers, staff members, students, parents,
families, community members, local business leaders, and elected officials such as school board
members, city councilors, and state representatives. Stakeholders may also be collective entities, such as
local businesses, organizations, advocacy groups, committees, media outlets, and cultural institutions, in
addition to organizations that represent specific groups, such as teachers unions, parent-teacher
organizations, and associations representing superintendents, principals, school boards, or teachers in
specific academic disciplines.

The idea of a “stakeholder” intersects with many school-reform concepts and strategies—such as
leadership teams, shared leadership, and voice—that generally seek to expand the number of people
involved in making important decisions related to a school’s organization, operation, and academics. For
example, shared leadership entails the creation of leadership roles and decision-making opportunities
for teachers, staff members, students, parents, and community members, while voice refers to the
degree to which schools include and act upon the values, opinions, beliefs, perspectives, and cultural
backgrounds of the people in their community. Stakeholders may participate on a leadership team, take
on leadership responsibilities in a school, or give “voice” to their ideas, perspectives, and opinions
during community forums or school-board meetings, for example.

You might also like