You are on page 1of 49

Achievement Tests

Dr. Aftab Ahmad Khan


Definition/Description
• An achievement test is designed to measure a person's level of skill,
accomplishment, or knowledge in a specific area.

• It is developed to measure skills and knowledge learned in a given


grade level, usually through planned instruction, such as training or
classroom instruction.

• It is an assessment of developed knowledge or skill.


Conti…

• Achievement is the accomplishment or proficiency of performance in a


given skill or body of knowledge.

• It is an examination administered to determine how much a person has


learned, or how much knowledge a person has or acquired.

• It is designed to measure the knowledge or proficiency of an individual in


something that has been learned or taught, as arithmetic or typing.
Conti…

• An achievement test is a standardized test that is designed to


measure an individual's level of knowledge in a particular area.

• It focuses specifically on how much a person knows about a specific


topic or area such as math, geography, or science.
Conti…
• Achievement Test refer to assessments which scores are often used to
determine the level of instruction for which a student is prepared

• High achievement scores generally indicate that a mastery level has


been reached, and that the student is prepared for advanced
instruction.

• Low achievement scores can indicate the need for further


remediation or repeating a course grade level.
Classification of Achievement Tests
• Achievement tests have generally been categorized as:
i. Single-subject tests,
ii. Survey batteries,
iii. Diagnostic tests,
iv. Group administered tests,
v. Individually administered tests
vi. Modality Specific Achievement Tests (May be Individually or group
administered)
Commonly Used Achievement Tests
a. Group Administered
i. California Achievement Tests

ii. Iowa Test of Basic Skills (IBTS)

iii. Metropolitan Achievement Test

iv. Stanford Achievement Test

v. SRA Achievement Series


Individually Administered

i. Basic Achievement Skills

ii. Individual Screener (BASIS)

iii. Kaufman Test of Educational Achievement

iv. Peabody Individual Achievement Test

v. Wide Range Achievement Test

vi. Woodcock Johnson Psychoeducational battery


Modality Specific
i. Classroom Reading Inventory
ii. Diagnostic Reading Scales
iii. Durrell Analysis of Reading Difficulty
iv. New Sucher-AIIred Reading Placement Survey
v. Gates-MacGinitie Reading Tests
vi. Gray Oral Reading Tests
vii. Nelson-Denny Reading Test
viii. Stanford Diagnostic Reading Test
ix. Woodcock Reading Mastery
Conti…
x. Enright Diagnostic Inventory of Basic Arithmetic Skills
xi. Keymath Revised Sequential Assessment of Mathematics
Inventories
xii. Stanford Diagnostic Mathematics Test
xiii. Test of Mathematical Abilities
xiv. Spellmaster Test of Written Language-3
xv. Woodcock Language Proficiency Battery
xvi. Written Language Assessment Test
Characteristics of a Good Achievement Test

• Reliability

• Validity

• Adequacy

• Objectivity

• Usability
Reliability
• Reliability refers to how dependably or consistently a test measures a
characteristic.

• A test that yields similar scores for a person who repeats the test –
such test is said to have reliability.

• It is the extent to which test scores are not affected by chance


factors—by the luck of the draw.
Conti…
• It is the extent to which the test taker’s score does not depend on:
i. the specific day and time of the test (as compared with other
possible days and times of testing)
ii. the specific questions or problems that were on the edition of the
test that the test taker took (as compared with those on other
editions), and
iii. the specific raters who rated the test taker’s responses
(if the scoring process involved any judgment).
Conti…
• We can also say that “Reliability Is Consistency”
• Test scores are reliable to the extent that they are consistent over:
i. different occasions of testing
ii. different editions of the test, containing different questions or
problems designed to measure the same general skills or types of
knowledge, and
iii. different scorings of the test takers’ responses, by
different raters.
Consistency of what Information?
• A test taker’s score – several kinds of information.

i. Relative position in some relevant group (large group or small


group)

ii. Placement --classify the test takers into groups. For example, may
be classified as “Advanced,” “Proficient,” or “Not Proficient” in a
particular subject.
Conti…
iii. Information that does not depend on the test taker’s relative
position or placement
(Say - GRE score of 158 - information meaningful to the people in
charge of admissions for a graduate program. They have had the
opportunity to see how previous students with GRE Verbal scores
near 158 have performed in that program -- Prediction
iv. Decision Making the score is the basis for a decision , whether the
test taker is awarded a degree, admitted to a training
program, or allowed to practice a profession.
Importance of Reliability
• Why is reliability important?
• Ask yourself whether a test score is useful if it does not indicate:
 How the test taker would have performed on a different day?
 How the test taker would have performed on a different set of
questions or problems designed to measure the same general skills or
knowledge? and
 How the test taker’s responses would have been rated by a
different set of raters?
Types of Reliability
• Two types

i. Internal reliability - assesses the consistency of results across items


within a test.

ii. External reliability - refers to the extent to which a measure varies


from one use to another.
Methods of Assessing Reliability

1. Test – retest Method

2. Equivalent-form Method

3. Test – retest with Equivalent-form Method

4. Split-half Method

5. Kuder-Richardson Method
Test – retest Method
• Test-retest reliability evaluates reliability across time
• Used when you are measuring something that you expect to stay constant
in your sample.
• Same test is administered twice to the same group of pupils with a given
time interval between the two administration and the resulting test scores
are correlated
• This gives us Measures of Stability – how stable the test scores are over a
given time interval?
• Highly stable results – high performer on first administration
will be high performer on the second administration
Conti…
• Many factors influence reliability such as:
i. Moods
ii. Interruptions
iii. Time of day, etc.
 A good test will largely cope with such factors and give relatively
little variation.
 An unreliable test is highly sensitive to such factors and will
give widely varying results, even if the person re-takes the
same test half an hour later.
Conti…

The longer the delay between tests, the greater the likely variation.

Better tests will give less retest variation with longer delays.
Equivalent-form/Parallel-form Method
• Uses one set of questions divided into two equivalent sets (“forms”),
where both sets contain questions that measure the same construct,
knowledge or skill.
• The two sets of questions are given to the same sample of people
within a short period of time and an estimate of reliability is
calculated from the two sets.
• Steps:
• Step 1: Give test A to a group of 50 students on a Monday.
• Step 2: Give test B to the same group of students that Friday.
• Step 3: Correlate the scores from test A and test B.
Split-half Method
• The split-half method assesses the internal consistency - how well the
test components contribute to the construct that’s being measured.
• it measures the extent to which all parts of the test contribute equally
to what is being measured.
• A test is split into two parts and then both parts given to one group of
students at the same time. And scores from both parts of the test are
correlated.
• A reliable test will have high correlation, indicating that a
student would perform equally well (or as poorly) on both
halves of the test.
Conti…
• Steps
i. Administer the test to a large group students (ideally, over about
30).
ii. Randomly divide the test questions into two parts. For example,
separate even questions from odd questions.
iii. Score each half of the test for each student.
iv. Find correlation coefficient for the two halves
Kuder-Richardson Method
• Kuder-Richardson Formula (KR-20; KR-21) is a measure reliability for
a test with binary variables (i.e. answers that are right or wrong).
• Used for items that have varying difficulty.
For example, some items might be very easy, others more
challenging.
• It should only be used if there is a correct answer for each
question — it shouldn’t be used for questions with partial
credit is possible.
Validity
• Validity refers to:
 Whether or not the test measures what it claims to measure.
 the extent to which a measurement tool measures what it's
supposed to measure.
 what characteristic the test measures and how well the test
measures that characteristic.
• It tells you how accurately a test measures something.
Conti…

• Validity also describes the degree to which you can make specific
conclusions or predictions about people based on their test scores.

• It indicates the usefulness of the test.

• On a test with high validity the items will be closely linked to the
test's intended focus/objective.
Examples
• A test of intelligence should measure intelligence and not something
else (such as memory, achievement, aptitude etc.).

• A test of achievement should measure achievement of the students


and not their intelligence.

• A job related test should measure only job related


qualifications/skills.

• An IQ test should only measure the IQ level.


Importance
• Vital…

 Suppose you developed a test for measuring the achievement of


your students in the subject of mathematics,

Then what…?
If the test doesn’t measure the achievement in mathematics
Types of validity
• Content-Related Validity
• Construct Validity
• Face Validity
• Criterion-Related Validity
• Concurrent Validity
Empirical Validity
• Predictive Validity
Content-Related Validity
• Content validity refers to the extent to which the items on a test are
representative of the entire domain the test seeks to measure.
• It assesses whether a test is representative of all aspects of the
construct.
• It measures knowledge of the content domain of which it was
designed to measure knowledge.
• It concerns, primarily, the adequacy with which the test items
adequately and representatively sample the content area to be
measured.
Conti…

 To produce valid results, the content of a test, survey or

measurement method must cover all relevant parts of the subject it

aims to measure. If some aspects are missing from the measurement

(or if irrelevant aspects are included), the validity is threatened.


Construct Validity
• Construct…?
--- proposed attribute of a person that often cannot be measured
directly - can be assessed using a number of indicators or manifest
variables.

• Constructs, theoretical constructs or latent variables are


interchangeable terms.

• Examples of constructs - Intelligence, motivation, anxiety, and fear


are all
Conti…
• Is the test constructed in a way that it successfully measures what it claims
to measure?
• What psychological qualities a test measures?
• The extent to which a test measures a specific theoretical construct or trait
• Usually verified by comparing the test to other tests that measure similar
qualities to see how highly correlated the two measures are.
• For example, to demonstrate the construct validity of a cognitive aptitude
test is by correlating the outcomes on the test to those found on other
widely accepted measures of cognitive
aptitude.
Conti…
• To test for construct validity it must be demonstrated that the
phenomenon being measured actually exists.

So …

the construct validity of a test for intelligence, for example, is


dependent on a model or theory of intelligence.
Face Validity
• Face validity is simply whether the test appears (at face value) to
measure what it claims to.
• This is the least sophisticated measure of validity.
• Tests wherein the purpose is clear, even to naïve respondents -- high
face validity.
• Tests wherein the purpose is unclear have low face validity.
Conti…
• Measurement of face validity is obtained by asking people to rate the
validity of a test as it appears to them. This rater could use a Likert
scale to assess face validity. For example:
 the test is extremely suitable for a given purpose
 the test is very suitable for that purpose
 the test is adequate
 the test is inadequate
 the test is irrelevant and therefore unsuitable
Important Considerations
a. Face validity should be avoided when the rating is done by EXPERT
as content validity is more appropriate.

a. Having face validity does not mean that a test really measures what
the researcher intends to measure
but
only in the judgment of raters that it appears to do so.
Criterion-Related Validity
• Criterion-related validity or Criterion validity measures how well one
measure predicts an outcome for another measure.

• A test has this type of validity if it is useful for predicting performance


or behavior in another situation (past, present, or future).
Concurrent Validity
• The degree to which a test corresponds to an external criterion
(occurring at the same time).

• If the new test is validated by a comparison with a currently existing


criterion, we have concurrent validity.

• A new IQ or personality test might be compared with an older but


similar test known to have good validity already.
Conti…
Example:
Suppose…
a. we give a social study class a test on knowledge of basic concepts in
social studies --- at the same time
b. the teacher’s report on each student’s knowledge of basic concepts
in social studies
If relationship b/w test scores and the teacher’s report is high --- the
test will have high concurrent validity
Predictive Validity
• Refers to the degree to which scores on a test are related to
performance on a criterion or standard that is administered at some
point in the future.
• if the test accurately predicts what it is supposed to predict.
• It is often considered in conjunction with concurrent validity in
establishing the criterion-based validity of a test or measure.
• For example, the validity of a cognitive test for job performance is the
correlation between test scores and supervisor performance ratings.

You might also like