You are on page 1of 23

CHARACTERISTICS OF A GOOD TEST

a. Valid -----refers to the extent to which measure what is purpose


to measure.
b. State that if the test item is congruent to the behavior to be
tested it is then valid.
Types of Evidence
CONSTRUCT-RELATED VALIDITY
-- refers to how well a performance
CONTENT VALIDITY- CRITERION-RELATED VALIDITY- on a particular set of task can be
explained by some
PSYCHOLOGICAL CONSTRUCT or
TRAITS.

1. PREDICTIVE VALIDITY
--- involves the use of criterion and THEORETICAL CONSTRUCT
refers to the ADEQUACY and a predictor. Example correlating
REPRESENTATIVENESS of the results of college entrance test -- describe by determining the
learning outcomes to be measured and student GWA at some future components of such psychological
time task
(predictor= CET; criterion= GWA)

2. CONCURRENT VALIDITY
CRITICAL CONSTRUCT
-- criterion are already available in
which CET is correlated with some - predictors, conclusions,
can be assure with the use of T.O.S assumptions, inference,
available criterion
interpretations and relevance of
(predictor= GWA; criterion= 4th year evidence
high school grade)
CHARACTERISTICS OF A GOOD TEST
RELIABILITY
--refers to the ―CONSISTENCY‖ of the test score.
--ERRORS of measurements are factors or conditions that can contribute to the
lowering of the test reliability. If the test has low reliability we can be assure
that errors of measurement have affected the test scores to the point that
the test is UNRELIABLE

SOME ERRORS OF MEASUREMENTS

•What happening within the •Test contain (poorly •Lightning of room, room
temp (too hot or too cold) •MISKEY/ providing wrong
individual?(fatigue, hunger, constructed items, items with answer, mistake in
headache, motional upset, clues, very easy, very noise, seating arrangement,
instruction, time allotment, correcting a wrong answer,
anxiety, growth and learning difficult, very high vocabulary mistake in the use of
acquired before the test)— reading level )—tends to attitude of test
examinee(MAKE THE TEST required pencil and
tends to reduce the guessing particularly when it subjective scoring
consistency of the SCORE is long UNRELIABLE LOWER THE
OVER TIME TEST SCORE)

Test Itself
Test
Test Takers (INTRA TEST Test Scoring
Administrations
ERROR)
TO DETERMINE THE CONSTRUCT VALIDITY OF CRITICAL
THINKING
1. Each subtest is correlated with the whole test.
2. The correlation of each subtest which measures a particular components
contribute to the measurement of a psychological trait which is critical thinking.
Define by:
X Y (proportion
(subtest) (correlation with the total score) of common
variance)

DEGREES OF RLATIONSHIP BETWEEN TWO SETS OF SCORE


+1.00----PERFECT POSITVE RELATIONSHIP (the better)more from the upper group got the test correctly.
0.00---- NO RELATIONSHIP
-1.00----PERFECR NEGATIVE RELATIONSHIP more from the lower group got the test correctly.

DISCRIMINANT VALIDITY---DIFFERENT TRAITS


CONSTRUCT
--- SCORE OF CRITICAL THINKING TEST ARE
CORRELATED WITH THOSE OF ATTITUDES
TOWARDS MOVIES
METHODS OF ESTIMATING TEST RELIABILITY

TEST-RETEST METHOD
--determines how scores are consistent over a given period of time. The same test is administered twice to the
same group with an interval between 2 to 15 days(sufficient time interval)(2-3 days student can recall
answer)(longer time interval lower the reliability)/true score= true score+error of measures/
PARALLEL/ALTERNATE FORMS METHOD
--used two different versions of the same test, administered to the same group close together
in time. It used form A or B and can be given on the same day or the next day. The difference of the
two is how they worded or written, it should measure the same skills and errors are significantly
controlled
TEST-RETEST WITH ALTERNATE FORMS METHOD
--administering the two version of the same test on two different occasions. Time interval may
be short(2 weeks)(longer for 6 months). Takes into account all possible sources of errors. It is the
most useful indicates variation of a test score over a period of time.
INTERNAL CONSISTENCY METHOD
-- employ only one test administration of the same test given to the same group on individual.
DIFERENT METHODS
1. SPILT-HALF /ODD-EVEN METHOD—scoring odd items, scoring even items
2. KUDER RICHARDSON FORMULA 20—two sets of score (odd and even) are correlated using
PRODUCT MOMENT CORRELATION COEFFICIENT FORMULA
3. TO TEST THE RELIABILITY OF THE WHOLE TEST (USE SPEARMAN-BROWN PROPHECY
FORMULA )
4. PEARSON r USED TO COMPUTE INTERMNAL CONSISTENCY OF A CERTAIN TEST USED IN
SPLIT-HALF METHOD
Reliability coefficient is high then it is said to be homogenous.
Consistency of the test scores determined over different parts of the
entire test..

RELIABILITY ESTIMATE WHAT TO


MEASURE

TEST-RETEST : TEST ADMIN, TEST TAKERS


ALTERNATE FORMS ; TEST ADMIN, TEST
ITSELF,
TEST-RETEST WITH ALTERNATE FORMS : TEST ADMIN, TEST
ITSELF, TEST TAKERS

INTERNAL CONSISITENCY : TEST ADMIN,


TEST ITSELF

NOTE: a reliability coefficient of +.86 of a test measure that 86/100 of the


obtained score of an individual is true score and 14/100 can be attributed
to errors of measurements.
IMPROVING THE TEST ITEMS

Item Analysis

• Who answer the • Is the extent to


item correctly which a test item
differentiate good
performer to poor
performer

Index of Index of
difficulty discrimination
METHOD TO EMPLOY IN ITEM ANALYSIS
-USING THE UPPER AND LOWER INDEX METHOD
27/100
1. After scoring the test, arrange from lowest to highest.
2. Segregate the top and bottom 27/100 of the paper.
3. Tally the correct answers to each item by each student in the upper
27/100 group.
4. Repeat step three, considering the lower 27/100.
5. Get the percentage of the upper group that obtained the correct
answer use U.
6. repeat step 5, considering lower group. Used L.
7. Get the average percent of U and L.
8. Get the difference between U and L.

L/U = NO. OF PUPILS GOT ITEM CORRECT


NL/NU = NO. OF PUPIL IN THE LOWER GROUP OVER UPPER GROUP
TABLE INTERPRETING DIFFICULTY INDEX

Range Description

0.00 – 0.20 Very difficult

0.21 – 0.40 Difficult

0.41 – 0.60 Moderate difficult

0.61 – 0.80 Easy

0.81 – 1.00 Very easy

The higher the difficulty index


the easiest the item is.
TABLE INTERPRETING INDEX OF
DISCRIMINATION
RANGE DESCRIPTION
A good
test item -1.00 - -0.61 Questionable item
separate
the bright -0.59 - -0.20 Not discriminating
performer
from the -0.19 – 0.20 Moderate discriminating
poor 0.21 – 0.60 Discriminating

0.61 – 1.00 Very discriminating

The higher the index of Formula:


discrimination the higher the Ds = {((U/NU)-(L/NL)}
discrimination
WHEN WOULD YOU SAY “GOOD OR RETAINED”
YOUR ITEM
-must have ACCEPTABLE INDEX OF DIFFICULTY AND DISCRIMINATION

ACCEPTABLE INDEX OF DIFFICULTY RANGES FROM 0.41 - 0.60

-ACCETABLE INDEX OF DISCRIMINATION RANGES FROM +0.20 - +1.00

FAIR OR REVISED
-UNACCEPTED DIFFICULTY OR DISCRIMINATION INDEX

POOR OR DISCARDED
-BOTH DIFFICULTY AND DISCRIMINATION INDEX ARE
UNACCEPTABLE.
THEN THE ITEM NEED TO BE DISCARDED RIGHT AWAY
TABLE OF ACTION TO BE TAKEN
DIFFICULTY LEVEL DISCRIMINATING ACTION
LEVEL
QUESTIONABLE ITEM
VERY DIFFICULT DISCARD
VERY DISCRIMINATING
NOT DISCRIMINATING DISCARD
MODERATELY
DIFFICULT DISCRIMINATING
REVISE
DISCRIMINATING RETAIN
NOT DISCRIMINATING REVISE
MODERATELY
MODERATE DIFFICULT DISCRIMINATING
MAY NEED REVISION
DISCRIMINATING ACCEPT
NOT DISCRIMINATING DISCARD
MODERATELY
EASY DISCRIMINATING
N.R.
DISCRIMINATING N.R
QUESTIONABLE SEE EXAMPLE
VERY EASY DISCARD
TRADITIONAL ASSESSMENT

Discrete Point(Single Attribute Assessment)


-- example Language assessment in the form of Multiple choices, matching type, true or false, or
short answer
Charles Spearman(1904)-Two Factor Theory
--general Factor Or G-factor and postulates specific or S-factor. Example of tests with g-factor are
Raven Progressive Matrices and Catre’s Culture Fair Intelligence test
Integrative or Global Assessment(Multiple Trait Assessment)
--measure more than one point or objective at a time, and often pragmatic.
Example is writing composition
Cloze Test
--innovative method for testing wherein words are deleted from a passage. The most common
practice is to delete every 5th word. The acceptable range for readability of certain reading materials
is between 30-50 percent.
C-Test
-- second half of every word is deleted., leaving the first and last word intact, and commonly
contains 100 words
Dictation Test
-- primarily a test for listening, and spelling. It is a test use to measure the ability to use capital
letters, punctuation marks, spell words correctly and write legibly and neatly.
ADMINISTERING DICTATION TEST
Read each word once or twice as student listen, ask student to write the word. Read the word again
for confirmation. Read each sentence slowly once or write then at normal speed once before
students are asked to write. And do not read the word while students are writing
Oral Interview-
--kind of integrative assessment. It is a collecting information through face-to-face between the
interviewee and interviewer. The interviewee is not at liberty to modify or make a follow up question.
The question should be prepared before hand and objective should be taking in consideration
MEASURE OF CENTRAL TENDENCY
Raw scores- scores obtained
Tabulating raw scores
steps in constructing a grouped frequency distribution are as follows
1. Determine the range of scores, ranges is equal to the highest score minus the
lowest score.
2. Determined the appropriate number of class interval ideal 10-15. be sure that the
lowest limit is divisible interval . Class interval is defined by k= 1+3.3logn, where n
is the number of sample and n = (N/(1+Ne^2))
3. Or i=range over k, the number of class size.
4. Determine the lowest limit (LL) of the interval, LS/I width = Q*I = LL.
5. Construct the frequency column (f) by tallying the no. of scores opposite each
interval.
Raking
-Another way to organize test scores. It is the process of arranging a group of scores
from highest to lowest. The highest scores is designated as first ranked, and so on.
-Steps in ranking the scores
- 1. arrange the scores from highest to lowest, particular scores may be written as many
times as it may occurs.
- 2. put a serial number opposite to each. 1,2,3,4,,..
- 3. average the rank of each scores appearing more than one. Example 45,45,45appear
three times and rank as 7, 8, and 9, then add = 24/3 = then they will be rank 8.
GRAPHING OF DATA
6
4
2 Series 1
0 Series 2
Series 3

6
Series
4 1
1. Histogram 2
2. Polygon 0 Series

Cate…
Cate…
Cate…
Cate…
3. bar 2
MEASURES OF CENTRAL TENDENCY
MEAN, MEDIAN, MODE
The MEAN– denoted by
-Simply the average of the group and most widely accepted measures of
central tendency
For Grouped data
For ungrouped data
--

Where - using
-- mean deviation

-- summation of x am- assume mean


N – total number of scores in d – deviation
distribution --- summation of frequency
times devation.
The MEDIAN is defined by
-- the middle most score in the distribution. It divides the
distribution in half or 50 % of the scores is found above the median, and
the other 50 % lies below the median .

For ungrouped data


For grouped data
1. Arrange the scores from
highest to lowest or vise
versa.
ll- lowest limit of N/2
2. If odd numbers, median
N- no. of cses
is the middle most number
Cf- cummulative frequency
in the distribution.
f- frequency where the measure lies
i- nterval
3. If even average the
middle.
The MODE is defined by
-- The most frequent, extremes, and repeated numbers. It is not
affected if one number is changed less then or greater than

For ungrouped data


For grouped data
1. The mode for ungrouped
data is the number that
occur most.

Mode = 3median –(2mean)


The measures of central tendency in different
distribution

1. NORMAL DISTRIBUTION
2. POSITIVELY SKEWED DISTRIBUTION
3. NEGATIVELY SKEWED DISTRIBUTION
. Normal distribution
Positively skewed distribution

1. THERE ARE MORE LOW SCORES


THAN HIGHER SCORE.
2. IT SHOWS THAT TEST IS SO
DIFFICULT
FORMED AN ASYMMETRICAL
DISTRIBUTION
> >
MEAN>MEDIAN>MODE

The graph shows that the number of


student who got good grades are
relatively lower than those who got
lower grades..
 Negatively skewed distribution
1. THERE ARE MORE HIGH
SCORES THAN LOWER SCORE.
2. IT SHOWS THAT TEST IS
VERY EASY, THUS EVEN THE
LOW PERFORMER STUDENT S
GOT GOOD GRADE
FORMED AN ASYMMETRICAL
DISTRIBUTION
> >
MODE>MEDIAN>MEAN
2. INVERSE OF POSITIVELY
DISTRIBUTION
The graph shows that the number of
student who got high grades are
relatively more than those who got
lower grades..
Forms of Assessment
1. TRADITIONAL ASSESSMENT
- EXAMPLE MULTIPLE CHOICE, MATCHING TYPE, TRUE OR FALSE
COMPLETION TEST
2. PERFORMANCE ASSESSMENT
-ENGAGE IN COMPLEX TASK, CREATION OF PRODUCT EX. DANCE
STEP, DEMONSTRATION
3. PORTFOLIO ASSESSMENT
-ON GOING EVALUATION, INVOLVES GATHERING OR COLLECTING MANY
DIFFERENT STUDENTS PROGRESS INDICATORS
4. AUTHENTIC ASSESSMENT
-REAL LIFE CRITERIA USE OF JUDGMENTS
THANK YOU!

You might also like