You are on page 1of 2

c 


      
Assessment, whether it is carried out with interviews, behavioral observations, physiological
measures, or tests, is intended to permit the evaluator to make meaningful, valid, and reliable
statements about individuals. What makes John Doe tick? What makes Mary Doe the unique
individual that she is? Whether these questions can be answered depends upon the reliability and
validity of the assessment methods used. The fact that a test is intended to measure a particular
attribute is in no way a guarantee that it really accomplishes this goal. Assessment techniques
must themselves be assessed.

c  
  
 
   
    

   
    
        

   

    

 

 
         
   

  
 

 
 



   

  
  
  
     
      
  
   
    
   
   
   

 

 

Tr      

A set of test questions is first administered to a small group of people deemed to be


representative of the population for which the final test is intended. The trial run is planned to
provide a check on instructions for administering and taking the test and for intended time
allowances, and it can also reveal ambiguities in the test content. After adjustments, surviving
items are administered to a larger, ostensibly representative group. The resulting data permit
computation of a difficulty index for each item (often taken as the percentage of the subjects who
respond correctly) and of an item-test or item-subtest discrimination index (‘   a coefficient of
correlation specifying the relationship of each item with total test score or subtest score).

If it is feasible to do so, measures of the relation of each item to independent criteria (‘   grades
earned in school) are obtained to provide item validation. Items that are too easy or too difficult
are discarded; those within a desired range of difficulty are identified. If internal consistency is
sought, items that are found to be unrelated to either a total score or an appropriate subtest score
are ruled out, and items that are related to available external criterion measures are identified.
Those items that show the most efficiency in predicting an external criterion (highest validity)
usually are preferred over those that contribute only to internal consistency (reliability).

Estimates of reliability for the entire set of items, as well as for those to be retained, commonly
are calculated. If the reliability estimate is deemed to be too low, items may be added. Each
alternative in multiple-choice items also may be examined statistically. Weak incorrect
alternatives can be replaced, and those that are unduly attractive to higher scoring subjects may
be modified.
˜ 

  
  
    
      
      !  
   
  
     
 
 
 

  


   
    

  
 
        
      


   
     
      
  
    
 
 
     
  

   
 
       
      "
  

  
      
 
  
        #   

     
  
  $   
 

     
       
       
 

         



0ell Shape Curve - Many phenomena, such as the distribution of IQs, approximate the classic
bell-shaped, or normal, curve (÷‘‘ normal distribution). The highest point on the curve indicates
the most common or modal value, which in most cases will be close to the average (mean) for
the population. A well-known example from physics is the Maxwell-0oltzmann distribution law,
which specifies the probability that a molecule of gas will be found with velocity components ,
, and  in the , , and  directions. A distribution function may take into account as many
variables as one chooses to include.

You might also like