Professional Documents
Culture Documents
• Reliability
• Validity
• Adequacy
• Objectivity
• Usability
Reliability
• Reliability refers to how dependably or consistently a test measures a
characteristic.
• A test that yields similar scores for a person who repeats the test –
such test is said to have reliability.
ii. Placement --classify the test takers into groups. For example, may
be classified as “Advanced,” “Proficient,” or “Not Proficient” in a
particular subject.
Conti…
iii. Information that does not depend on the test taker’s relative
position or placement
(Say - GRE score of 158 - information meaningful to the people in
charge of admissions for a graduate program. They have had the
opportunity to see how previous students with GRE Verbal scores
near 158 have performed in that program -- Prediction
iv. Decision Making the score is the basis for a decision , whether the
test taker is awarded a degree, admitted to a training
program, or allowed to practice a profession.
Importance of Reliability
• Why is reliability important?
• Ask yourself whether a test score is useful if it does not indicate:
How the test taker would have performed on a different day?
How the test taker would have performed on a different set of
questions or problems designed to measure the same general skills or
knowledge? and
How the test taker’s responses would have been rated by a
different set of raters?
Types of Reliability
• Two types
2. Equivalent-form Method
4. Split-half Method
5. Kuder-Richardson Method
Test – retest Method
• Test-retest reliability evaluates reliability across time
• Used when you are measuring something that you expect to stay constant
in your sample.
• Same test is administered twice to the same group of pupils with a given
time interval between the two administration and the resulting test scores
are correlated
• This gives us Measures of Stability – how stable the test scores are over a
given time interval?
• Highly stable results – high performer on first administration
will be high performer on the second administration
Conti…
• Many factors influence reliability such as:
i. Moods
ii. Interruptions
iii. Time of day, etc.
A good test will largely cope with such factors and give relatively
little variation.
An unreliable test is highly sensitive to such factors and will
give widely varying results, even if the person re-takes the
same test half an hour later.
Conti…
The longer the delay between tests, the greater the likely variation.
Better tests will give less retest variation with longer delays.
Equivalent-form/Parallel-form Method
• Uses one set of questions divided into two equivalent sets (“forms”),
where both sets contain questions that measure the same construct,
knowledge or skill.
• The two sets of questions are given to the same sample of people
within a short period of time and an estimate of reliability is
calculated from the two sets.
• Steps:
• Step 1: Give test A to a group of 50 students on a Monday.
• Step 2: Give test B to the same group of students that Friday.
• Step 3: Correlate the scores from test A and test B.
Split-half Method
• The split-half method assesses the internal consistency - how well the
test components contribute to the construct that’s being measured.
• it measures the extent to which all parts of the test contribute equally
to what is being measured.
• A test is split into two parts and then both parts given to one group of
students at the same time. And scores from both parts of the test are
correlated.
• A reliable test will have high correlation, indicating that a
student would perform equally well (or as poorly) on both
halves of the test.
Conti…
• Steps
i. Administer the test to a large group students (ideally, over about
30).
ii. Randomly divide the test questions into two parts. For example,
separate even questions from odd questions.
iii. Score each half of the test for each student.
iv. Find correlation coefficient for the two halves
Kuder-Richardson Method
• Kuder-Richardson Formula (KR-20; KR-21) is a measure reliability for
a test with binary variables (i.e. answers that are right or wrong).
• Used for items that have varying difficulty.
For example, some items might be very easy, others more
challenging.
• It should only be used if there is a correct answer for each
question — it shouldn’t be used for questions with partial
credit is possible.
Validity
• Validity refers to:
Whether or not the test measures what it claims to measure.
the extent to which a measurement tool measures what it's
supposed to measure.
what characteristic the test measures and how well the test
measures that characteristic.
• It tells you how accurately a test measures something.
Conti…
• Validity also describes the degree to which you can make specific
conclusions or predictions about people based on their test scores.
• On a test with high validity the items will be closely linked to the
test's intended focus/objective.
Examples
• A test of intelligence should measure intelligence and not something
else (such as memory, achievement, aptitude etc.).
Then what…?
If the test doesn’t measure the achievement in mathematics
Types of validity
• Content-Related Validity
• Construct Validity
• Face Validity
• Criterion-Related Validity
• Concurrent Validity
Empirical Validity
• Predictive Validity
Content-Related Validity
• Content validity refers to the extent to which the items on a test are
representative of the entire domain the test seeks to measure.
• It assesses whether a test is representative of all aspects of the
construct.
• It measures knowledge of the content domain of which it was
designed to measure knowledge.
• It concerns, primarily, the adequacy with which the test items
adequately and representatively sample the content area to be
measured.
Conti…
So …
a. Having face validity does not mean that a test really measures what
the researcher intends to measure
but
only in the judgment of raters that it appears to do so.
Criterion-Related Validity
• Criterion-related validity or Criterion validity measures how well one
measure predicts an outcome for another measure.