You are on page 1of 29

Test

Development
INTRODUCTION:
 According to Cullari (1998)has said, “A test is a standardized
procedure for sampling behavior and describing it with scores and
categories.”
 Anastasi and Urbina (1997)have defined a psychological test as

“essentially an objective and standardized measure of sample of


behavior.
 Kaplan and Saccuzzo (2001)have opined, “A psychological test or

educational test is ac set of items designed to measure


characteristics of human beings that pertain of behavior.
Characteristics of good test:
For a test to be scientifically sound, it must posses the following characteristics:
 Objectivity: a test must have the trait of objectivity i.e., it must be free from

the subjectivity element so that there is complete interpersonal agreement


among experts regarding the meaning of the items and scoring of the test.
 Reliability :a test must also be reliable. Reliability here refers to self

correlation of the test. It shows the extent to which the results obtained are
consistent when the test is administered once or more than once which the
same sample with reasonable time gap. For a test to be called sound, it must
be reliable because reliability indicates the extent to which the scores obtained
in the test are free from such internal defects of standardization which are
likely to produce errors of measurement.
 Validity: validity indicates the extent to which the test measures what it
intends to measure, when compared with some outside and independent
criterion. It is the correlation of the test with some outside criterion. Validity
of the test is dependent on the reliability because a test which yields
inconsistent results(poor reliability) is ordinarily not expected to correlate
with some outside independent criterion.
 Norms: A test must be guided by certain norms. Norms refers to the average

performance of a representative sample of a given test. There are four


common type of norms- age norm, grade norm, percentile norm, and standard
score norm. norms help in interpretation of the scores in the absence of
norms, no meaning can be added to the scores obtained in the test.
 Practicability/usability: a test must also be practicable /usable from
the point of view of the time taken in its completion, length, scoring,
etc. it other words, the test should not be lengthy and the scoring
method must be difficult nor one which can only be done by highly
specialized persons.
Classification of Tests:
 On the basis of the criterion of administrative conditions: Tests
have been classified into two types on the basis of administrative conditions-
individual tests and group tests.
Individual test are those tests that are administered to one person at a time.
Koh’s Block Design Test is an example of an individual test. Individual tests, in
general, have two limitations, i.e., such tests are time consuming and require the
services of trained examiners
Group tests are tests which can be used among more than one person or in a
group at a time. Bell Adjustment Inventory is an example of the group tests.
 On the basis of the criterion of scoring: Scoring is one of the
vital parts of a test. Based upon this criterion, tests are classified into two
types- objective test and subjective test.
Objective test are those whose items are scored by competent examiners or
observers in such a way that no scope for subjective judgment or opinion
exists and thus, the scoring remains unambiguous. Tests having multiple-
choice, true-false and matching items are usually called objective tests.
Subjective tests are tests whose items are scored by the competent
examiners or observers in a way in which exists some scope for subjective
judgment and opinion. These are also known as free- answer tests.
 On the basis of the criterion of time limit in producing the
response
On the basis of this criterion, the tests are classified into power tests and speed
tests.
A power test is one which has a generous time limit so that most examinees are
able to attempt every item. Most of the intelligence tests and aptitude tests
belong to the category of power tests.
Speed tests are those that have severe time limits but the items are
comparatively easy and the difficulties involved therein are more or less of the
same degree.
 On the basis of the criterion of purpose or objective: Based upon
this criterion, tests are usually classified as intelligence tests, aptitude tests,
personality tests. Intelligence tests intend to assess intelligence of the
examinees
 On the basis of the criterion of the nature or contents of item:
Important types of this criterion are:
Verbal test
Nonverbal test
Performance test
Non-language test
1) Verbal test is one whose items emphasize reading, writing is the primary mode of
communication. Jalota Group General Intelligence Test and Mehta Group Test of
Intelligence are the examples of verbal tests.
2) Nonverbal test are those that emphasize but don’t altogether eliminate the role of language
by using symbolic materials like pictures, figures etc.
3) Performance test are those that require the examinees to perform a task rather than answer
some questions.
4) Non-language test are those which depend upon any form of written, spoken or reading
communication.
 On the basis of the criterion of standardization:
Based upon this criterion, tests are classified into standardized tests and teacher-made tests.
Standardized tests are those which have been subjected to the procedure of standardization. The
meaning of the term ‘standardization’ is controversial and includes at least the following
conditions:
1) The first condition for standardization is that there must be a standard manner of giving
instructions so that uniformly can be maintained in the evaluation of all those who take
the test.
2) The second condition for standardization is that there must be uniformity of scoring and
an index of fairness of correct answer through the procedure of item analysis should be
available.
3) The third condition is that reliability and validity of the test must be established and the
individuals for whom the test is intended should be explicitly mentioned.
4) The fourth condition, a controversial one, is that standardization test should be have
norms. However, according to Cronchbach (1970,27), a test even without norms may be
called a standardized test. But the majority of psychologists favor the idea that a
standardized test should have norms as well.
TEST CONCEPTUALIZATION:
 Step 1: Describe purpose and rationale for test – What the test will measure
and for what purpose will the test will be used?
 Step 2: Describe the target population for the test
 Step 3: Clearly define the key variable of interest
 Step 4: Create item specifications
 Step 5: Choose item format
 Step 6: Specify administration and scoring procedures.
Test Conceptualization:
STEP 1: Purpose and Rationale
 What is the test designed to measure?
 What is the purpose for the test?
 Who will use the test?
 Who will take the test?
 How will the test be administered?
 What is the ideal format for the test given the test taker

characteristics and time/financial considerations?


 Is there any potential harm that can result from test administration?
 How will test scores be meaningful?
STEP 2:Target population for the test:
Relevant characteristics to consider:
 Age
 Educational Status
 Language (which language and level of language ability)
 Literacy Level
 Disabilities .
STEP 3:Define variable:
 Develop by referencing:
 Theory
 Empirical Literature
 Cultural Definitions
 DSM-V or other diagnostic criteria
STEP 4: Item specifications:
 List of major content areas to be included in the test
o Also can include number of items

 For clinical tests, the DSM-V (or other guidelines for diagnosis)

can provide guidelines for item specifications


 Otherwise, develop item specifications from theory or definitions

of variable
STEP 5: Item format:
 Rating Scale: Grouping of words, statements, or symbols on which
judgments concerning the strength of a particular trait, attitude, or
emotion are indicated by the test-taker.
 Paired Comparisons: Test-takers are presented with pairs of

stimuli that they are asked to compare and select one.


 Useful to place test-takers into categories
 Avoids the problem of test-takers agreeing (or disagreeing) to

every category.
STEP 6:Administration and scoring
procedures:
 Standardized assessment requires very clear administration
guidelines so all participants complete the assessment under
similar conditions
• Critical for performance and observation assessments
 Scoring guidelines should provide specific score interpretations

based on empirical research


STEP7: Test Try-Out
 Try items out on people similar in critical respects to the people
for whom the test was designed.
 Sample size: No fewer than 5 participants (preferably at least 10

participants) for every 1 item on the test.


 Conduct under conditions as identical as possible to conditions

under which the final test will be administered.


TEST CONSTRUCTION:
 The set of activities involved in developing and evaluating a test of
some psychological function.
 Before the real work of test construction begins certain broad

decisions are taken by the investigator.


 These preliminary decisions have far reaching consequences. It is

at this stage that the test constructor outlines the major objectives
of the test in general terms and specifies the population for whom
the test is intended
Steps of Test Construction:
 Planning of the test :the first step in the construction of the test is careful
planning. At this stage, the test constructor specifies the broad and specific
objectives of the test in clear terms. He decides upon the nature of the content
or items to be included. Planning also includes the total number of
reproductions of the test to be made and a preparation of a manual.
 Writing items of the test : It starts with the planning done earlier, if the
test constructor decides to prepare an essay test the essay items are written
down, if he decides to construct an objective test the objective items are
written. There are some essential prerequisites which must be met if the writer
wants to write good and appropriate items.
 Preliminary administration of the test : Once the items are written
down and modified based on experts comments the test is made ready for it’s
experimental try out.
The main purpose of the experimental try out of any psychological and
educational test is as given below:
o Finding out the major weaknesses, omissions, ambiguities and inadequacies

of the items.
o Determining the difficulty values of each item which in turn helps in selecting

items for their even and proper distribution in the final form.
o Determining the validity of each individual item.

o Determining a reasonable time limit of the test.


o Determining the appropriate length of the test.
o Determining the inter-correlations of items so that overlapping can be
avoided.
 There are some essential prerequisites, which must be met if the item
writer wants to write good and appropriate items.
o The item writer must have a thorough knowledge and complete mastery of the
subject matter.
o The item writer must be fully aware of those persons for whom the test is
meant, their intelligence level so that he may manipulate the difficulty level of
the items for proper adjustment with their ability level.
o He must have a large vocabulary.
o The item writer must be familiar with different types of items along with their
advantages and disadvantages.
 Reliability of the test : As soon as the test is experimented the test is computed
again on the reliability coefficient. Reliability is the co-efficiency of the test and it
indicated the consistency of the scores. The size of the sample for this purpose should
not be less than 100. There are three common ways of calculating reliability coefficient,
namely test retest method, split half method and the equivalent form method.
 Validity of the test : It refers to what and how well the test measures. If a test
measures a trait that it intends to measure well, we say that the test is a valid one.
Validity may also be defined as the correlation of the test with some outside
independent criteria. There are three main types of validity- content validity, construct
validity, and criterion related validity.
 Norms : The average performance or score of a large sample
representative of a specified population. Norms are prepared to
meaningfully interpret the scores obtained on the test for as we know the
obtained scores on the test themselves convey no meaning regarding the
ability or trait being measured but when these are compared with the norms
a meaningful inference can immediately be drawn.
 Preparation of the manual and reproduction of the test : This is the
last step of test construction, constructor reports the psychometric property of the
test, norms and references. It also includes instructions as well as detail of
arrangement of material i.e. whether items have been arranged in random order or
in any other order. The test constructor after seeing the importance and
requirement of the test, finally orders for printing of the test and the manual.
Uses of psychological test:
Psychological tests are widely used for many purposes.
1) In classification: Psychological tests are popularly used in making classification of
persons, that is, for assigning the persons to one category rather than to another one.
Important types of categories are placement, screening, certification and selection.
2) In diagnosis and planning for treatment: Psychological tests are play a
significant role in making diagnosis and in planning for treatment. Diagnosis means
determining the nature of a person’s abnormal behavior and classifying the behavior
pattern within accepted system.
3) In self knowledge: psychological tests are also useful in providing self- knowledge to
the test takers to the extent that such knowledge tends to change their career path.
4) In evaluation of programs: psychological tests are often used in evaluation of
various types of educational and social programs. In schools and colleges, different types
of programs for betterment of academic achievement are carried out and the persons want
to know about its impact.
Limitations of psychological tests:
1) Invasion of privacy: Psychological test may be invasion of privacy if they are
used without the permission of the testee to obtain personal and sensitive information.
2) Permanently categorize the persons: On the basis of the performance of
psychological tests, the testee or examiners, are given certain categories like mentally
retarted, gifted, brain- damaged, etc. and the authority behaves accordingly
disregarding evidence of any further change.
3) Limited and beneficial aspects of behavior: It is said that psychological
tests cannot measure the most important human traits. They force the examinees to
take decisions based on superficial and relatively unimportant criteria.
4) Create anxiety: Generally, it has been reported that when the assessment is to be
done through psychological tests, the examinees feel anxious and this anxiety affects
their performance.
Item Analysis:
 A set of procedures are used to evaluate the statistical merits of individual
items comprising a psychological measure or test. These procedures may be
used to select items for a test from a larger pool of initial items or to evaluate
items on an established test.
 After the items have been written, reviewed and carefully edited, they are
subjected to a procedure called item analysis. Item analysis is a set of
procedures that is applied to know the indices for the truthfulness (or validity)
of items. In other words, item analysis is a technique through which those
items which are valid and suited to the purpose are selected and the rest are
either eliminated or modified to suit the purpose.
Main objectives of item analysis:
 Item analysis indicates which items are difficult, easy, moderately difficult or
moderately easy. In other words, it provides an index of the difficulty value to
each item.
 It also provides indices of the ability of the item to discriminate between
inferior and superior. In other words, item analysis indicates the
discrimination value of each item. This is known as item validity.
 It indicates the effectiveness of the distractors in multiple choice items. A
thorough item analysis is done to indicate the extent to which the distractors
are effective in each item.
 It sometimes also indicates why a particular item in the test has not functioned
effectively and how this might be modified so that its functional significance
can be increased.
Reference: Tests, measurements and research
methods in behavioural sciences by A.K.Singh

THANK YOU

You might also like