You are on page 1of 11

ASSIGNMENT

Submitted by: Minahil Saif BSF1804754

Amna Nabeel Dar BSF1804898

Samia Shahbaz BSF1804883

Soniya Manzoor BSF1804879

Nabeela Majeed BSF1804799

Noor fatima naqvi. BSF1807518

Atika amjad. BSF1804751

Maheen zehra BSF1804693

Maria saleem BSF1804895

Huma ishaq BSF1804746

Program: B.Ed(Hons) 2018-22

MORNING A

Semester: 8

Subject: Test Development and Evaluation

Submitted to: Dr. Rizwan Ahmad

Test Administration And Analysis


Item analysis and modification

What is test administration


Test administration is concerned with the physical and psychological setting in which
students take the test, foe the students to do their best (Airisian) 1994.

Definition of test administration


A test administrator is responsible for observing and monitoring the students during
examinations and related assessments. Test administrators discuss the procedures and
rules before the test, responding to any inquiries and concerns the examinees may have.

Factors in the test administration


Factors that impact or influence performance in a testing situation include
client/patient/student factors, clinician factors, environmental factors, and those involving
the actual testing process itself.

Administration process
Test administration procedures are developed for an exam program in order to help
reduce measurement error and to increase the likelihood of fair, valid, and reliable
assessment. Specifically, appropriate standardized procedures improve measurement by
increasing consistency and test security.

Furthermore, administration procedures that protect the security of the test help to
maintain the meaning and integrity of the score scale for all examinees.
Why is test administration important in the teaching/learning procWess
Testing is important in a school because it helps teachers and the pupils to determine how
much they have taught and learnt, respectively. It is because of the teacher trying to find
the areas of difficulty in order to take corrective measures that tests are administered in
school.

Guidelines for test administration


1. Guidelines for test administration
2. Trial run-through including how to exit during a test.
3. Testing environment and equipment.
4. Student preparation.
5. Supervision.
6. Giving encouragement, prompts and feedback.

Test administration in psychology


Psychological testing is the administration of psychological tests. Psychological tests are
administered by trained evaluators. A person's responses are evaluated according to
carefully prescribed guidelines. Scores are thought to reflect individual or group
differences in the construct the test purports to measure.

Working methods of test administration


Broadly speaking, test administration is divided into three categories that all are familiar
with, based on medium: live performance testing, paper-and-pencil testing (often
abbreviated PPT, PNP, or P&P), and computer-based testing.

What is test analysis


A detailed statistical assessment of a test's psychometric properties, including an
evaluation of the quality of the test items and of the test as a whole.

Importance of test analysis


Item analyses are intended to assess and improve the reliability of your tests. If test
reliability is low, test validity will necessarily also be low. This is the ultimate reason you
do item analyses—to improve the validity of a test by improving its reliability

Test Analysis and Interpreting


To accurately interpret test scores, the teacher needs to analyze the performance of the
test as a whole and of the individual test items, and to use these data to draw valid
inferences about student performance
Steps in item analysis (relative criteria tests)
1. Award of a score to each student.
2. Ranking in order of merit.
3. Identification of groups: high and low.
4. Calculation of the difficulty index of a question.
5. Calculation of the discrimination index of a question.

Administration and conducting the test


Administering the written test is perhaps the most important aspect of the
examining process. The atmosphere, the test administrator creates in the test room
and the attitude, the test administrator displays in performing his/her duties is
extremely important.

The paramount guiding principle in administering any classroom test is that all a
examines should be given a fair chance to demonstrate their achievement of the
learning outcomes Intended or planned. The implies physical and psychological
environment in which the examination is taken place has to be conductive for the
examinee to facilitate the achievement of the testing outcomes. The factors that
might interfere with validity of the measurement also have to be controlled.

The test administrator’s manner, bearing, and attitude may well inspire confidence.
In competitors and put them at case while participating in the testing process.

Test administration procedures are developed for and exam program in order to
help reduce measurement error and to increase the likelihood of fair, valid, and
reliable assessment. Specifically, appropriate standardized procedure improve
measurement by increasing consistency and test security. Consistent, standardized
administration of the exam allow you to make direct comparisons between
examimes’scores, despite the fact that the examinees may have taken their tests on
different dates, at different sites, and with different proctors.

A teacher’s test administration procedures can have great impact on Student test
performance Certain examiner and situational variables also influences test
performance. The degree of preparedness, the personality, and the behavior of an
examiner during the test are examiner variable affecting examine performance.
Other situational variables affecting test scores are the place, time, and
environmental condition.

Item analyses
Item analysis is analyzing individual items to make sure that their difficulty level is
appropriate, that they discriminate well between students of different performance
levels, and then looking deeper into distractor rationales and response frequencies.

Validity

• Validity is the extent to which a test measures what it is supposed to


measure.
• Validity is a crucial onsideration in evaluating tests.
• new commercial tests cannot be published without validation studies, it is
reasonable to expect similar evidence of validity for tests that screen
individuals for high stake decisions such as promotion, graduation, or certifi-
cation.
• Cronbach (1949) said that validity was the extent to which a test measures
what it purports to measure and that a test is valid to the degree that what it
measures or predicts is known.

Categories of validity:

Cronbach (1949) identified two categories of validity:

Logical validity

Empirical validity

Logical validity:

Logical validity is a set of loosely organized, broadly defined approaches based on


content analysis that includes examination of operational issues and test-taking
processes.
Empirical validity:

Empirical validity (also called statistical or predictive validity) describes how


closely scores on a test correspond (correlate) with behavior as measured in other
contexts. Students’ scores on a test of academic aptitude, for example, may be
compared with their school grades (a commonly used criterion)

Improving test validity:


The following test development procedures will increase test validity:

• Specify the instructional objectives. Describe the behaviors that will be


tested.
• Describe the conditions under which the test will be given.
• Determine number of items required. Prepare domain specifications.
Determine testing time required. Select item formats.
• Write test items matched to specific objectives or sets of objectives.
• Use a number of items that proportionately reflects the amount of training
time devoted to objectives.
• Establish a vocabulary level appropriate for the intended trainees.
• Have experts independently review the items. Prepare clear, simple test
instructions. Establish standards for mastery.
• Use a specification table to match topics, objectives, and skill levels which
trainees are expected to attain for each item.
• Estimate what proportion of the curriculum addresses each competence.
• Prepare scoring keys.
• Prepare report and table formats.
• Administer the draft test to a group comparable to people who will be
trained.
• Interview examinees to determine if they felt the test was appropriate.
• Compute a difficulty index for each item for in-structured and uninstructed
groups.
• Compute the discrimination index for each item. Summarize item statistics
for each item.
• Evaluate how items discriminate between masters and non-masters for each
objective.
• Analyze choice of distracters for questionable multiple-choice items
• Revise test based on discussion with criterion group and item statistics.
• Assemble items into the final test.

Reliability:
• Classic reliability is a test score that includes a true score and random error.
• A true score is a theoretical, dependable measure of a person’s obtained
score uninfluenced by chance events or conditions.
• It is the average of identical tests administered repeatedly without limit.
Identical implies that a person is not affected by the testing procedure, which
is unlikely to occur.
• Reliability, which is the best single measure of test accuracy, is the extent to
which test results are consistent, stable, and free of error variance.
• It is the extent to which a test provides the same rank- ing of examinees
when it is re-administered
• A reliable test may not be valid.

Types of reliability coefficients:


Stability (test-retest) is the correlation between two successive measurements
using the same test. The reliability may be spuriously high effect due to item recall
if the time between test administrations is too brief or too low if too much time
elapses between the pretest and posttest.

Equivalence (alternate forms) is the correlation between two administrations of


parallel forms of the same test. This is the best index of test reliability.

Internal consistency is calculated using the Spearman-Brown formula based on a


split-half techniques that compares two equivalent halves of a test using odd vs.
even numbered items. Another method involves Cronbach’s alpha that compares
the variance of each item to total test variance.
Rational equivalence uses the Kuder-Richardson (KR) 20 provides relatively
conservative estimates of the coefficient of equivalence. KR formula 21 is less
accurate, but simple to compute.

Decision-consistency, which is the average of squared deviation from the


established mastery level, is used with criterion- referenced tests.

Item revision:
Analyzing the pattern of responses for distracters is an effective way to determine
the effectiveness of distracters. The following procedures will improve the process
of item revision.

• Ideally, trainees should answer all pretest questions incorrectly and all
posttest questions correctly.
• If a majority of students miss an item, it does not necessarily have to be
changed, but check it for accuracy and find out if the material was covered
during training.
• Eliminate or replace distracters that are not chosen by any examinees.
• Discrimination indexes should be positive for correct answers and negative
for incorrect answers.
• Calculate discrimination indexes for each distracter.

Item difficulty:
• Item difficulty is the percentage of people who answer an item cor- rectly. It
is the relative frequency with which examinees choose the correct response
• It has an index ranging from a low of 0 to a high of +1.00. Higher difficulty
indexes indicate easier items.
• An item answered correctly by 75% of the examinees has an item difficult
level of .75. An item answered correctly by 35% of the examinees has an
item difficulty level of .35.
• Item difficulty is a characteristic of the item and the sample that takes the
test. For example, a vocabulary question that asks for synonyms or English
nouns will be easy for American graduate stu- dents in English literature, but
difficult for elementary children.
• Item difficulty has a powerful effect on both the variability of test scores and
the precision with which test scores discriminate among groups of
examinees
• Item difficulty is calculated by using the following formula (Crocker &
Algina, 1986).

Difficulty = # who answered an item correctly X 100 Total # tested

Item discrimination:
• Item discrimination compares the number of high scorers and low scorers
who answer an item correctly.
• It is the extent to which items discriminate among trainees in the high
and low groups.
• The total test and each item should measure the same thing. High
performers should be more likely to answer a good item correctly, and
low performers more likely to answer incorrectly.
• To compute item discrimination, a test is scored, scores are rank-
ordered, and 27 percent of the highest and lowest scorers are selected.
The number of correct answers in the highest 27 percent is subtracted
from the number of correct answers in the lowest 27 percent. This result
is divided by the number of people in the larger of the two groups.
• The higher the discrimination index, the better the item because high
values indicate that the item discriminates in favor of the up- per group
which should answer more items correctly. If more low scorers answer an
item correctly, it will have a negative value and is probably flawed.
• A negative discrimination index occurs for items that are too hard or
poorly written, which makes it difficult to select the correct answer. On
these items poor students may guess correctly, while good students,
suspecting that a question is too easy, may answer in- correctly by
reading too much into the question.

Differential item functioning:


• Test bias is a serious concern, particularly if test results determine
whether a person will be hired, retained, or promoted.
• DIF identifies test questions that give one groups of test takers an
unfair advantage over another.
• It assumes that test takers who have approximately the same level of
knowledge, as measured by total test scores, should perform in a
similar manner on individual test questions regardless of sex, race or
ethnicity.
• DIF scores are presented as differences on a scale that ranks test
questions according to difficulty.
• A negative value means that the test question is more difficult for the
focal group, while a positive value indicates that it is more difficult
for the reference group (males or whites). The higher the number, the
greater the difference between matched groups.
• DIF is an important consideration for organizations that use tests to
select staff, such as departments of social services and education.

You might also like