P. 1
Assignment Presentation Report

Assignment Presentation Report

|Views: 15|Likes:
Published by Schahyda Arley
PResentation report
PResentation report

More info:

Published by: Schahyda Arley on Jul 24, 2013
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less





NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

INDIVIDUAL ASSIGNMENT (PRESENTATION REPORT) Presentation 1: Building a Test (writing and evaluating test items)/ Selection and decision analysis/Test administration Introduction Writing test items is a matter of precision, perhaps more akin to computer programming than to writing prose. A test item must focus the attention of the examinee on the principle or construct upon which the item is based. Ideally, students who answer a test item incorrectly will do so because their mastery of the principle or construct in focus was inadequate or incomplete. Any characteristics of a test item which distract the examinee from the major point or focus of the item, reduces the effectiveness of that item. Any item answered correctly or incorrectly because of extraneous factors in the item, results in misleading feedback to both examinee and examiner. A poet or writer, especially of fiction, relies on rich mental imagery on the part of the reader to produce an impact. For item writers, however, the task is to focus the attention of a group of students, often with widely varying background experiences, on a single idea. Such communication requires extreme care in choice of words and it may be necessary to try the items out before problems can be identified.

Essential Characteristics of Item Writers Given a task of precision communication, there are several attributes or mind sets that are characteristics of a proficient item writer. Knowledge and Understanding of the Material Being Tested At the University level, the depth and complexity of the material on which students are tested necessitates that only faculty members fully trained in a particular discipline can write concise, unambiguous test items in that discipline. Further, the number of persons who can meaningfully critique test items, in terms of the principles or constructs involved, is limited. An agreement by colleagues to review each other‘s tests will likely improve the quality of items considerably prior to the first try-out with students. Continuous Awareness of Objectives A test must reflect the purposes of the instruction it is intended to assess. This quality of a test, referred to as content validity, is assured by specifying the nature and/or number of

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516
items prior to selecting and writing the items. Instructors sometimes develop a chart or test blueprint to help guide the selection of items. Such a chart may consider the modules or blocks of content as well as the nature of the skills a test is expected to assess. In the case of criterion-referenced instruction, content validity is obtained by selecting a sample of criteria to be assessed. For content-oriented instruction, a balance may be achieved by selecting items in proportion to the amount of instructional time allotted to various blocks of material. An example of a test blueprint for a test with thirty-eight items is shown below. Test Blueprint Types of Tests Knowledge of terms Reliability Validity Correlation Total 3 1 4 4 4 1 14 1 4 6 2 1 14 5 11 12 7 3 38

Comprehension of principles 3 Application of principles Analysis of situations Evaluation of solutions Total 2 1 1 10

The blueprint specifies the number of items to be constructed for each cell of the twoway chart. For example, in the above test blueprint, two items are to involve the application of the principles of reliability. Continuous Awareness of Instructional Model Different instructional models require items of quite different characteristics for adequate assessment. For example, appropriate item difficulty in a mastery-model situation might be a maximum value of 20 (twenty-percent of the students answering incorrectly). On the other hand, items written for a normative model might have an appropriate average difficulty of the order of 30 to 40.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516
Ideally, item discrimination (the degree to which an item differentiates between students with high test scores and students with low test scores) should be minimal in a mastery-model situation. We would like to have all students obtain high scores. In the normative-model, item discrimination should be as high as possible in order that the total test differentiates among students to the maximum degree. Understanding of the Students for Whom the Items are intended Item difficulty and discrimination are determined as much by the level of ability and range of ability of the examinees as they are by the characteristics of the items. Normative-model items must be written so that they provide the maximum intellectual challenge without posing a psychological barrier to student learning through excessive difficulty. In either the normative or mastery models, item difficulty must not be so low as to provide no challenge whatever to any examinee in a class. It is generally easier to adjust the difficulty than to adjust the discrimination of an item. Item discrimination depends to a degree on the range of examinee ability as well as on the difficulty of the item. It can be difficult to write mastery-model items which do not discriminate when the range of abilities among examinees is wide. Likewise, homogeneous abilities make it more difficult to write normative-model items with acceptably high discriminations. No matter what the instructional model or the range of abilities in a class, the only way to identify appropriate items is to select them on the basis of subjective judgment, administer them, and analyze the results. Then only items of appropriate difficulty and discrimination may be retained for future use. Skill in Written Communication An item writer's goal is to be clear and concise. The level of reading difficulty of the items must be appropriate for the examinees. Wording must not be more complicated than that used in instruction. Skill in Techniques of Item Writing There are many helpful hints and lists of pitfalls to avoid which may be helpful to the item writer. This is an area where measurement specialists may be particularly helpful. The remainder of this hand-out will be devoted to item-writing tips. Guideline for Item Writing • Define clearly what you want to measure

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516
– – • Use substantive theory as a guide As specific as possible

Generate an item pool – – Theoretically: all items are randomly choose from universe of item content Practice: - selecting and developing items - avoid redundant items

Avoid exceptionally long items – Confusing or misleading

• •

Keep the level of reading difficulty appropriate for those who will complete the scale Avoid ‗double-barreled‘ items that convey two or more idea at the same time – – ‗I vote Democratic because I support social programs‘ ‗I vote Democratic‘ / ‗I support social programs‘

Consider mixing positively and negatively worded items – Acquiescence response set: the respondents will tend to agree with most items – Avoid bias: include items that are worded in opposite direction (‗I felt hopeful about the future‘-asking about depression – CES-D)

When constructing a multiple response item, consider the following:  Plan to have two correct answers out of five choices or three correct out of five or six choices. Always remove distracters that are not being selected by examinees.  Do not use ―Choose all that apply‖ instead identify the number of choices that are needed to supply a complete correct responses. It is important to provide information to examinees as to the number of correct choices as a matter of fairness.  Identify the number of correct options. Use the phrase ―Choose XXX‖ in the item stem, and present it in parentheses preceded by spaces. For example: (Choose TWO) or (Choose THREE).  Score test items so that selecting only the correct options count as a correct response. Do not give partial credit or accept examinee selection of two correct options, but also selected a third incorrect option. Score the item as incorrect.

but are not limited to. These item formats can be scored either objectively by computers or subjectively through judgments by human evaluators. The following discussion solely relates to objectively scored test items where examinees select a response(s) to a question and it is scored by a computer program. short answers. Constructed response questions require different forms of scoring rubrics but still follow similar guidelines in how questions are formulated. these item formats are relatively easy to write and place in published examinations for either paper exams or computer-administered exams. A good test is only as good as the quality of the test items.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Item Format Test item development is a critical step in building a test that properly meets certain standards. The most popular formats include. multiple-response. Test items are the building blocks of an exam. Readers should not be surprised by what the most popular question formats are since they have probably seen these formats throughout their education. The multiple-choice format is sometimes criticized since some laypeople consider it a poor way to evaluate a person‘s . If the individual test items are not appropriate and do not perform well. and distracters (incorrect responses). The multiple choice and multiple-response formatted test items are the most popular since they can be scored easily and reliably by machines as compared to examinee constructed responses. and simulations. Also. multiple-choice. test items must be developed to precisely measure the objectives prescribed by the blueprint and meet quality standards. essay. how can the test scores be meaningful? Therefore. correct answer(s). The person responding to the question can either select or construct the appropriate response to answer the question depending on the item format presentation. All test items are composed of three parts: item stem (question). The term ―options‖ refers generally to all the choices that are available. matching. Test Item Formats There are many test item formats that can be used in a computer-based examination. We will only discuss selected response test items in this document.

The format requires an examinee to choose or identify where a specific location is on a picture (graphic) by clicking on it. Time is much better spent on developing multiple choice or multiple response format questions.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 knowledge and skills. and 3 or more distracters when possible. should be used infrequently (< 10%). This true-false item format. MULTIPLE-RESPONSE QUESTIONS . MULTIPLE-CHOICE QUESTIONS Multiple-choice format is made up of an item stem. research has shown that these items do perform well when they are well constructed. more often than not. The true-false question is a statement of fact and gives the option to select True or False as an answer choice. USE OF GRAPHICS/EXHIBITS ―Hot Area‖ Graphics or Exhibits can be used within this format as well as with multipleresponse formats. Donath has come to recognize the failure of this item format and always recommends not to use them at all in developing test questions. if used. Two distracter questions can work if it is difficult to write another distracter. There is only one correct answer for this format and can be written so that it measures not only knowledge of facts but can be used to evaluate high order thinking that requires problem-solving or critical thinking. TRUE-FALSE QUESTIONS This test item format is presented first since it is the simplest selected response format. The hot spot graphic has to have areas identified as incorrect choices and an area that is correct. We often hear this format referred to as ―multiple guess. an answer. Most true-false questions can be rewritten into a multiple choice question.‖ Yet. This item format can work well when constructed properly. However. this item format fails to perform well statistically.

Such analyses can also be employed to revise and improve both items and the test as a whole. Essentially. this is a combination of two or three multiple choice items in one. it is very important to conduct item and test analyses.  Do not use ―Choose all that apply‖ instead identify the number of choices that are needed to supply a complete correct response. assessment is the obvious starting point. validating. Always remove distracters that are not being selected by examinees. training. It is generally more difficult to answer and also discriminates very well between those who are proficient and those who are not proficient with the subject area being tested. or for educational research purposes. upgrading assessment is continuous process. and present it in parentheses preceded by spaces. It is important to provide information to examinees as to the number of correct choices as a matter of fairness. However. . followed by testing.  Identify the number of correct options. Score the item as incorrect. Use the phrase ―Choose XXX‖ in the item stem. When constructing a multiple response item. When tests are developed for instructional purposes. Do not give partial credit or accept examinee selection of two correct options.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 This item format has an item stem and more than one correct answer. For example: (Choose TWO) or (Choose THREE). Item Analysis Introduction: It is widely believed that ―Assessment drives the curriculum‖. and learning is to be upgraded. but also selected a third incorrect option. The cycle of planning. and constructing assessment tools. to assess the effects of educational programs. and reviewing has to be repeated continuously. Hence it can be argued that if the quality of teaching. consider the following:  Plan to have two correct answers out of five choices or three correct out of five or six choices. These analyses evaluate the quality of the items and of the test as a whole.  Score test items so that selecting only the correct options count as a correct response.

What is the Output of Item Analysis? Item analysis could yield the following outputs:  Distribution of responses for each distractor of each item or frequencies of responses (histogram). Communication to the test developer which items needs to be improved or eliminated.   Difficulty index for each item of the test Discrimination Index for each item of the test. so that these data may become more meaningful and therefore more useful Need for Item Analysis 1) Provision of information about how the quality of test items compare. 3) 4) Provision of a rational basis for discussing test results with students. . Item-analysis procedures are intended to maximize test reliability.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Quantitative item analysis is a technique that enables us to assess the quality or utility of an item. Because maximization of test reliability is accomplished by determining the relationship between individual items and the test a whole. This paper will introduce some of the terms encountered in the analysis of test results. It does so by identifying distractors or response options that are underperforming. which are often neglected or completely ignored. 2) Provision of diagnostic information about the types of items that students most often get incorrect. The comparison is necessary if subsequent tests of the same material are going to be better. It this not the case. the total score will be a poor criterion for evaluating each item. it is important to insure that the overall test is measuring what it is supposed to measure. This information can be used as a basis for making instructional decision. The use of a multiple-choice format for hour exams at many institutions leads to a deluge of statistical data. to be replaced with better items.

The difficulty index. Issues to consider in relation to the Difficulty Index .NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516  Measure of exam internal consistency reliability Total Score Frequencies Issues to consider when interpreting the distribution of students‘ total scores: Distribution:    Is this the distribution you expected? Was the test easier. more difficult than you anticipated? How does the mean score of this year‘s class compare to scores from previous classes?   Is there a ceiling effect – that is. the more difficult the question. The higher the difficulty index the easier the question. are all scores close to the lower possible? Spread of Scores:   Is the spread of scores large? Are there students who are scoring low marks compared to the majority of the students?   Can you determine why they are not doing as well as most other students? Can you provide any extra assistance? Is there a group of students who are well ahead of the other students? Difficulty Index It actually tells us how easy the item was for the students in that particular group. the lower the difficulty index. are all scores close to the top? Is there a floor effect – that is. in fact. equals to ―Easiness Index‖.

85 – 1.84 0. Issues to consider in relation to the Item Discrimination Index .69 0. those with the lowest difficulty ranking. It is assumed that persons in the top third on total scores should have a greater proportion with the item correct than the lower third. Therefore. the DI is a measure of how successfully an item discriminates between students of different abilities on the test as a whole. if it is very difficult everyone tends to get it wrong. early on. The discrimination index is affected by the difficulty of an item. that they cannot succeed.29 0. Literature quotes following (generalized) interpretation of Difficulty Index. The calculation of the index is an approximation of a correlation between the scores on an item and the total score.00 – 0. if an item is very easy everyone tends to get it right and it does not discriminate.70 – 0. Any item which did not discriminates between the lower and upper group of students would have a DI=0. because by definition.30 – 0. students can become upset because they feel. the first items in the test?  If the more difficult items occur at the start of the test. Items should not be discarded just because they do not discriminate.15 – 0. Such items can be important to have in a test because they help define the range of difficulty of concepts assessed. An item where the lower group performed better than the upper group would have a negative DI.14 Inference to Question Very Easy Easy Optimum Hard Very Hard Item Discrimination Index (DI) This is calculated by subtracting the proportion of students correct in the lower group from the proportion correct in the upper group.00 0. Indexed Range 0.e. i.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516  Are the easiest items in your test. Likewise.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516  Are there any items with a negative discrimination index (DI)? That is.00 or very close to 0.29 0.20 – 0.19 0.30 – 1. Indexed Range Below 0. Non-proprietary • • • Mot protected by copyright law Distributed by test developer or publish in journals User pay no royalty .00 Inference to Question Poor Dubious Okay SELECTION AND DECISION ANALYSIS The Test Manual 1.e.0?  Are these items which are either very hard or very easy and therefore where you could have a DI of 0? Literature quotes following (generalized) interpretation of Discrimination Index. where the DI is 0. terms where students in the lower third of the group did better than students in the upper third of the group?    Was this a deceptively easy item? Was the correct answer key used? Are there any items that do not discriminate between the students i. Proprietary • • • • Owned by test developer or publishing company Protected by copyright law User must pay to use it Eg Minnesota Multiple Personality Inventory and Standford-Binet Intelligence 2.

e. the base rate is critical for comparison. 100 000 people tried the treatment. maybe the control groups. In other words. until we look at the entire 'Treatment X' population and find that the base rate of success is actually only 1/100 (i. It is actually three different tables.e. The concept of base rate of success simply means the percentage of people in a population that could successfully perform a job. but the other 99 000 people never really beat their winter cold). if it were the case that 1% of the public were "medical professionals". Note that controls may likewise offer further information for comparison. the base rate of success is the percentage that would be successful at the job. despite that initial proud claim about 1000 people. particularly medicine. "1000 people. In science. base rate generally refers to the (base) class probabilities unconditioned on featural evidence. and 99% of the public were not "medical professionals". who were using no treatment at all. In plainer words. It may at first seem impressive that 1000 people beat their winter cold while using 'Treatment X'.. Controls thus indicate that 'Treatment X' actually makes things worse. then the base rate of medical professionals is simply 1%.. frequently also known as prior probabilities. Each table is created around a different base rate of success. had their own base rate success of 5/100. Taylor Russell Table The Taylor-Russell Tables help us see the connection between the validity of a test and the likelihood of the test resulting in a successful selection of a job candidate.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Eg : Centre for Epidemiologic Studies Depression • Need to consult the test manual to determine whether a given test is suited for your purpose Base Rate In probability and statistics. out of how many?") is available. The treatment's effectiveness is clearer when such base rate information (i. Let‘s look at table below. . if you were to simply randomly select people for a job.

U(W) = 5 Utility function is when there is a lottery (uncertainty involved). An action can have more than one possible result:  In the simple case an action's result is one deterministic state (i. In this case the EU (estimated utility) of an action is equal to the U (utility) of its result. value function is when you're evaluating the worth of actual states. . Utility Theories and Decision Analysis Utility theory is used in decision analysis to determine the EU (estimated utility) of some action based on the U (utility) of its possible result(s). but there is no uncertainty.30 is still a non-skilled job).NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 In the three tables below. A Utility function U(W) maps from states (W being a world state) to real numbers i. and chance.e. the easiest job is the one with the highest base rate of success and the most difficult job is the one with the lowest base rate of success (although a base rate of success of 0. lights are on or off). A value or fitness function for chess would be a winning game. probability.e. A value for backgammon however is a utility function since the rolling of the dice brings in uncertainty.

speed. color. possible service costs of owning a car depends on the probability of the car needing service at each possible cost. It is presumed that the action with MEU (maximum expected utility) should be chosen. In some cases strict dominance of U(Wxj) over U(Wyj) can be asserted if the utility values of the j attributes are readily known and U(Wxj) > U(Wyj) In most cases stochastic dominance must suffice since we usually don't know the exact values of all attributes before the actions take place. Thus. the EU of an action Ak is the sum of the utilities of all its possible results times the probability of each result happening.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516  In reality there are usually a number of possible results for each possible action:  represents possible actions to take  represents possible resulting worlds Where U(Wi) is the utility of Wi and p(Wi|Ak) is the probability of Wi resulting given Ak (from taking action Ak). and . There is an issue of sequential actions. An action could also have one result with a continuous measure of degrees of intensity and a continuous probability distribution. possible service costs. This is covered in terms of attributes below. Wij might be owning a car with j attributes: gas mileage. To get the utility of such a result one would take the integral of its utility values as a function of its probability distribution. For example. Each Wi or action result can have any number of attributes. If we compare the actions of buying car 1 and buying car 2 where  X represents the possible service costs  p1(x) is the probability distribution of one car's possible service costs  p2(x) is the probability distribution of the other car's possible service costs  a is the least amount of money you can spend on service  b is the most you can spend on service. etc.

taken together. . ◦If an action A stochastically dominates all other actions on all attributes. is less than or equal to every given probability of every possible service cost for car 2. Average IQ score (6th & 9th grade) were higher for those who had received the test under enhanced rapport condition than for those with a neutral administrator  Examiner made approving comments (‗good‘ or ‗fine‘) and examiner used disapproving comments (‗ I thought you could better than that‘) Children who took the test under disapproving examiner received lower scores than children exposed to a neutral or an approving examiner   A familiar examiner may make a different to the younger children in score for test I. taken together. respondents may give the response that they perceive to be expected by the interviewer. Test Administration The Examiner And The Subject a) The relationship between examiner and test taker  Behavior and relationship of both can effect test score Half children were given test under enhanced rapport condition (examiner used friendly conversation and verbal reinforcement during test administration). The integral tells us that every given probability of every possible service cost for car 1.Q test score increase in children from lower socioeconomic class Familiarity with the test taker Preexisting notion about test taker‘s ability Can either positively or negatively bias test result  Interviewer effects In attitudinal surveys. The examiner rapport had little effect on the score of younger children (3rd grade). Other half children took test under a neutral rapport condition (examiner neither initiated conversation nor used reinforcement). then for any monotonically non decreasing utility function the expected utility of A is at least as high a the expected utility of all other actions.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516  then buying car one stochastically dominates buying car 2.

The examiner were paraprofessionals rather than psychologists The white examiner obtained higher scores from white than from African American children Scores for both group children were comparable when tested by African American examiners  African American children obtain higher test scores when the items were administered in the thematic mode Score lower on IQ test because poorer reading skills c) Language of T some tests are inappropriate for people whose knowledge of the language is questionable    Translating test is difficult. Strict procedure Well-trained African American and white administrators act almost identically Study show no effects  Examiner effects tend to increase when examiners are given more discretion about the use of the test. the test should be given in the language that the test takers feel their best. Test interpreters can be bias into the testing situation Test Taker . respondents might take their cues from sex and age of the interviewer People tend to disclose more information in a self-report format than they do an interviewer. No differences between the children when they were given Stanfort-Binet by the African American examiner and by the white one. People report more symptoms and health problems in a mailed questionnaire than they do in a face-to-face interviewer Computer administration is at least as reliable as traditional test administration b) The Race of The Tester  A little evidence that the race of the examiner sigficantly affects intelligence test scores. Administering an IQ test are so specific. No significant in intelligence test score between African American and white influenced by having a trained African American or white examiner.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Interviewed by telephone. it cannot be assumed that the validity and reliability of the translation are comparable to the English version Test taker who are proficient in two or more language.

Some challenged: 1) claiming that they are based on unsound statistical procedures or faulty design 2) expectancy effect exists in some but not all situation  Expectancies shape judgments in many important way Grand reviewers supposed to judge the quality of proposal independently but reviewers‘ expectancies about the investigators do influence their judgment  2 aspects of expectancy effects relate tp the use of standardized tests: 1) Expectancy effect (Rosenthal‘s experiment) were obtained when all the experimenters followed a standardized script – from nonverbal communication between the experimenter and the subject – not aware of his or her role in the process. People that tend to fail if they were told that average response are fail the test.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 d) Training of The Administrators   Different assessment procedures require different levels of training Psychiatric diagnosis obtained using the Structured Clinical Interview for DSM-IV (SCID) Users are licensed psychiatrists or psychologists with additional training on the test  No standardized protocol for training people to administer complicated tests such Wechsler Adult Intelligence Scale-Revised (WAIS-R) Although these tests are usually administered by licensed psychologists. e) Expectancy Effects   data sometimes can be affected by what an experimenter expects to find Robert Rosenthal and his colleagues at Harvard University conducted experiments on ‗Expectancy Effects‘ called ‗Rosenthal Effects‘. 2) Expectancy effect has a small and subtle effect on scores and occurs in some situations but not in others  Expectancy effect can impact intelligence testing in many ways such as scoring Graduate students with some training in intelligence testing tended to give more credit to responses purportedly from bright test takers  A variety of interpersonal and cognitive process variable affect our judgment of others .

1998) • Effect of praise are strong as the effect of money or candy (Merrell. Taylor & Terrell. 1970)  Children will work quite hard to obtain praise such as ‗You are doing well‘ (Eisenberger & Cameron. 1978) • Respond to reinforcement such as ‗Nice job. 1999) • Girls increase their accuracy on the WISC block design subtle – given any type of reinforcement for correct response • Boys increased their accuracy only when given chips that could be exchanged for money •  Verbal praise – boys increase speed. girls decrease speed African American children do not respond as well to verbal reinforcement as they do when reinforcement was candy or money (Schultz & Sherman. 1976) • Verbal reinforcement is not cultural relevent (Terrell.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Physician‘s voice can affect the way patients rate their doctors f)    Effects of Reinforcing Responses Reinforcement affects behavior Inconsistent use of feedback can damage the reliability and validity of test scores Incentives can help improve performance on IQ test for subgroups of children. Nice j ob . little Brother‘. Blood‘ –culturally relevant • The way an interviewer responds affects the content of responses in interview studies (Cannell & Henson. 1998) . • 6 to 13 years old student receive token they can exchange for money – Improved the performance of lower-class white children but not middle-class children or lower class African American children (Sweet. 1974) – • People reported more symptoms if they had been reinforced Random reinforcement destroy the accuracy of performance and decrease the motivation to respond (Eisenberger & Cameron.

low motivation for responding and inability to solve problems known as learned helplessness (Abramsom. scoring and interpretation including ease of application of complicated psychometric issues and the integration of testing and cognitive psychology  Computer are objective and cost-effective. Allow more experimental control than other method of administration Precise limit on the amount of time any one item can be studied Prevent test takers from looking ahead at other section of the test or going back to section already completed Ensures standardization and control and also reduces scoring errors  Obtain sensitive information Students were less likely to disclose socially undesirable information during a personal interview than on a computer. . Alloy & Metalsky. causing depression.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 – Effect of random feedback are rather severe. 1995) • Test administrators exert strict control over the use of the feedback g)    Computer-Assisted Test Administration Easy access Presentation of the test items and automatic recording of test responses Advantages of computers: Excellence of standardization Individually tailored sequential administration Precision of timing responses Release of human testers for other duties Patience (test taker not rushed) Control bias  Advantages in test administration.

Same women may perform more poorly on test of perceptual and spatial ability during mid-cycle than during menses Men also vary in test performance as a function of variations in sex hormones . Can cause harm if misinterpreted. emotionality and lack of self-confidence  illness affects test scores Not perform as well as when you feeling well Health status affect performance in behavior and in thinking Elderly may do better with individual testing session  Normal hormonal variations affects test performance Healthy women experience variations in their perceptual and motor performance as a function on menstrual cycle Women may perform better on tests of speeded motor coordination than would during menstruation.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 More honest when tested by a computer than by a person   At least as traditional assessment Computer-generated test reports in the hands of an inexperienced psychologist cannot replace clinical judgment. College students suffer a serious debilitating condition known test anxiety Often have difficulty focusing attention on test items distracted by (‗ I am not doing well‘ or ‗I am running out of time‘) 3 components: worry. h) Subject Variables  Motivation and anxiety can greatly affect test score.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Behavioral Assessment Methodology Measurement go beyond the application of psychological tests. drift and expectancies a) Reactivity  ‗Observes the observers‘  Reliability and accuracy are highest when someone is checking on the observers  Called ‗reactivity‘ because it is a reaction to being checked  Accuracy and the interrater agreement decrease when observers believers their work is not being checked  Experimenter might randomly check on the performance of the observers without their knowledge. the observer plays more active role in recording the data. Many assessment procedures involve the observation of behavior. Some problems include reactivity. whereas other studies are not Behavioral observers will notice the behavior they expect Cause bias in the behavioral when observers receive reinforcement for recording a particular behavior than they do not Effect behavioral data . Frequent meetings to discuss method can reduce this effects c) Expectancies administrator expectancies can affect scores on individual IQ tests. In behavioral observation studies. b) Drift       Observers have a tendency to drift away from the strict rules they followed in training and to adopt idiosyncratic definitions of behavior ‗Contrast effect‘ – tendency to rate the same behavior differently when observations are repeated in the same context.

(1997) Writing test items to evaluate higher order thinking.) (2006) Handbook of Test Development. & Bond. Hambleton. Addison-Wesley Publishing Company. Downing. (1979).C. Julie B. Inc. and Novick. Brown. L.. (1980). R. Standards for educational and psychological testing. practice and research (Special issue). . Lord. Fritz (Eds. Allyn and Baron. Melvin R. Association of Test Publishers. Testing: Concepts. Review methods for criterion-referenced test items. Glaser. Steven M. Brooks/Cole Publishing Company. Lawrence Erlbaum Associates. (1999). Innovations in computerized assessment. (1999) Developing and Validating Multiple-choice Test Items. Guidelines for test use: A commentary on the standards for educational and psychological tests. (1980). Guidelines for Computer-Based Testing.). Lawrence Erlbaum Associates.. Mary J. National Council on Measurement in Education.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 d) Statistical Control of Rating Errors   The ‗halo effect‘ is the tendency to ascribe +ve attributes independently of the observed behavior This effect can be controlled through ‗partial correlation‘ in which the correlation between two variables is found while variability in a third variable is controlled REFERENCE Allen. United States of America. Lawrence Erlbaum Associates. Haladyna.). Thomas M. Haladyna. (Eds.). Thomas M. Introduction to Measurement Theory.K. (2002). (1981). Paper presented at the annual meeting of the American Educational Research Association. Drasgow. R. National Council on Measurement in Education. Frederic M. Frederick G. (1999). 36(10). Washington. Thomas M. (1968). (Eds. Boston. American Psychologist. Haladyna. Statistical theories of mental test scores.: American Psychological Association. and Yen. policy. (Eds. New Jersey. D. American Educational Research Association. Wendy M. New Jersey. Inc. and Olson-Buchana. American Psychological Association. Inc.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Nunnally. Jim C. Psychometric Theory. MA Presentation 2 : Projective tests of personality A test designed to reveal hidden emotions and internal conflicts via a subject‘s responses to ambiguous stimuli. performance. Kluwer Academic Publishers. therapists use these tests to learn qualitative information about a client. the respondent's answers can be heavily influenced by the examiner's attitudes or the test setting. Some therapists may use projective tests as a sort of icebreaker to encourage the client to discuss issues or examine thoughts and emotions. His first experiment was conducted in 1897 and consisted of choosing a selection of words and letting his mind free associate. Projective personality tests are supposed to be able to measure areas of your unconscious mind such as personality characteristic. (1998). While projective tests have some benefits.. so interpretations of answers can vary dramatically from one examiner to the next. Norwell.). McGraw-Hill. fears. and other formats (2nd. Constructing test items: Multiple-choice. content from projective tests is analyzed for meaning. Strengths and Weaknesses of Projective Tests Projective tests are most frequently used in therapeutic settings. Francis Galton is the person who invented this method of testing. Bernstein. Inc Osterlind.). In many cases. Steven J.. (3rd ed. (1994). Some employers use these type of tests to try and see if you are an appropriate fit for their work environment. doubts and attitude. Scoring projective tests is also highly subjective. . ed. Instead of being scored to a universal standard as with an objective personality test. constructed-response. He then took the words that he generated in reaction to the original list and put them into new classifications which led think more about the possibilities of subconsciousness and thought. they also have a number of weaknesses and limitations. Ira H. For example.

For example.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Additionally. Some experts suggest that the latest versions of many projective tests have both practical value and some validity. What evidence is there that an interpretation of an inkblot (or a picture drawing or sample of handwriting--other items used in projective testing) issues from a part of the self that reveals true feelings. or on deceiving oneself for that matter? Even if the interpretations issued from a part of the self which expresses desires. but that does not imply either that the patient has had . Those who believe in the efficacy of such tests think that they are a way of getting into the deepest recesses of the patient's psyche or subconscious mind. it is a long jump from having desires to having committed actions. projective tests that do not have standard grading scales tend to lack both validity and reliability. However. creative expression? What justification is there for assuming that any given interpretation of an inkblot does not issue from a part of the selfbent on deceiving others. Validity refers to whether or not a test is measuring what it purports to measure. Those who give such tests believe themselves to be experts at interpreting their patients' interpretations. although he did not use them for personality analysis. an interpretation may unambiguously express the desire to have sex with the therapist. The test is considered "projective" because the patient is supposed to project his or her real personality into the inkblot via the interpretation. rather than. these tests are still widely used by clinical psychologists and psychiatrists. Rorschach inkblot test The Rorschach inkblot test is a psychological projective test ofdo not try this at home! personality in which a subject's interpretations of ten standard abstract designs are analysed as a measure of emotional and intellectual functioning and integration. The test is named after Hermann Rorschach (1884-1922) who developed the inkblots. say. while reliability refers to the consistency of the test results. structure less entities which are to be given a clear structure by the interpreter. The inkblots are purportedly ambiguous.

the concept seems preposterous. who will interpret his? etc. To avoid this logical problem of having a standard for a standard for a standard. The Exner System uses inkblots as a standardized test. the experts invented standardized interpretations of interpretations. perhaps on the brink of schizophrenic withdrawal from people (Dawes. For one thing. interpret the interpretation. Rorschach testing is inherently problematic. the interpretation must be examined as if it were a story or dream with no particular reference in reality. Imagine admitting people into med school on the basis of such a standardized .e. Hence. What empirical tests have been done to demonstrate that any given interpretation of an inkblot is indicative of any past behaviour or predictive of any future behaviour? In short. Thus. Clearly. who is to interpret the therapist's interpretation? Another therapist? Then. If there were no standardized interpretations of the interpretations. to be truly projective the inkblots must be considered ambiguous and without structure by the therapist." while one who sees figures which are half-human and half-animal indicates that he is alienated. For example. the therapist must interpret the patient's interpretation without reference to what is being interpreted. On its face. To have any hope of making the inkblot test appear to be scientifically valid. then the same interpretations by patients could be given equally valid but different interpretations by therapists. Then the third person would have to be interpreted by a fourth ad infinitum. Exner did. The blots can't be considered completely formless. Both form and content are standardized. You might as well have the patient interpret spots on the wall or stains on the floor. but must be given a standard response against which the interpretations of patients are to be compared as either good or bad responses. a patient who attends only to a small part of the blot is "indicative of obsessive personality. the therapist must not make reference to the inkblot in interpreting the patient's responses or else the therapist's projection would have to be taken into account by an independent party. In other words. interpreting the inkblot test is about as scientific as interpreting dreams. ultimately the therapist must make a judgment about the interpretation. This is what John E.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 sex with the therapist or that the patient. would agree to have sex with the therapist.. i. the inkblot becomes superfluous. But again. it was essential that it be turned into a non-projective test.. Even so. 148). etc. if given the opportunity.

Precautions/Limitations with Rorschach    Results are poorly verifiable.the phenomenon of seeing the relationship one expects in a set of data even when no such relationship exists. . some meaningful. Subject to bias (countertransference).Rorschach identifies half of all test-takers as possessing distorted thinking. some true. The Rorschach enthusiast should recognize that inkblots or dreams or drawings or handwriting may be no different in structure than spoken words or gestures. It is an unprovable assumption that dreams or inkblot interpretations issue from a source deep in the subconscious which wants to reveal the "real" self. will the clinician tend to see inkblot response as more negative if they happen to be prejudiced? In other words. The mind is a labyrinth and it is a pipe dream to think that the inkblot is Ariadne's thread which will lead the therapist to the centre of the patient. Loren and Jean Chapman (1960s) Found responses thought to be indicative of homosexuality were just as likely to be given by heterosexual males. Declining adherence to the Freudian principle of repression on which the test is based. a false positive rate unexplained by current research It is also thought that the test's reliability can depend substantially on details of the testing procedure : Where the tester and subject are seated introductory words verbal and non-verbal responses to subjects' questions or comments how responses are recorded. Students told the male might be a homosexual were more likely to read through inkblot responses and interpret them as such when they were arbitrary developed.").   Illusory correlation. Each is capable of many interpretations.Ex: If the individual comes from lower socioeconomic background or different race. some false. will the clinician project his or her own unconscious impulses about the test-taker on the analysis?   Over pathologize normal individuals. some meaningless.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 test! Or screening candidates for the police academy! ("I didn't get in because I failed the inkblot test.

it can be concluded that the Comprehensive System can yield high reliability when used under the conditions applied in these studies. Most of these studies reported reliabilities in the range of 85% to 99%. test-retest reliability is another important consideration. among 84 raters evaluating 70 Rorschach variables.75. Aside from inter-rater reliability.90. Validity depends on the ability of a test to measure the constructs that it is purported to measure (Wiener & Greene. In particular. were among the most reliable.81 and . But. 2008). who may cut corners as a result . there was a strong inter-rater reliability.26 to . 389-90) reported reliabilities from . 2009. However.89. Therefore. disordered thinking. They reported that in their own study. and 10 below . Exner (as cited in Groth-Marnat.92 over a 1-year interval considering 41 variables. four of them were above . It was further noted that the most relied upon factors. Viglione and Taylor (2003) specifically examined this issue using the Comprehensive System. Validity in this case can be evaluated by comparing . pp. ratios and percentages. Exner has published detailed instructions. 25 between . the most unreliable variables were attributed to state changes. They also reviewed 24 previously published papers. but Wood et al(2003) cites many court cases where these have not been followed  Exner's system was thought to possess normative scores for various populations. particularly for the base-rate variables. discrepancies seemed to focus on indices measuring narcissism.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516  Procedures for coding responses while fairly well specified are extremely timeconsuming to inexperienced examiners. 2008). and discomfort in Reliability and Validity of Rorschach Inkblot Test Reliability depends on the ability to achieve a given measurement consistently (Weiner & Greene. beginning in the mid-1990s others began to try to replicate or update these norms and failed. all reporting various inter-rater reliabilities.

They found ―equivocal‖ findings.‖ Early studies produced validity scores of . Groth-Marnat (2009.50. They concluded that their findings ―contradict the view that the Rorschach is a clinically sensitive instrument for discriminating psychopaths from non-psychopaths. The disorganized thoughts and peculiarities of language of schizophrenics can be seen in in interpretation  Several scores correlate well with general intelligence: # of responses .. such studies were further confounded by variables such as age. number of responses. but are confounded by various factors including the ―type of scoring system. Carlsson. They reported a mean validity coefficient of 0. stated that the Rorschach has a validity effect size ―almost identical‖ to the MMPI (Weiner. p. p. More recent studies of validity have met with mixed results.. 2010). (2010) evaluated the Rorschach using a meta-analysis of 22 studies including 780 forensic subjects. experience of the scorer. 336). Responses given by those with Schizophrenia and Bi-Polar disorder (manic phase). Advantages of Rorschach inkblot test   Higher inter-rater reliability through the Exner scoring system (1960s).NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 the Rorschach with clinical data or with other established tests of personality. 2001. Another result was reported by Lindgren. but indicated that the LR ―may have some validity in the assessment of trauma-related phenomena. and a validity of 0.062 using all variables.a standard method used for interpreting the test Substantial evidence scores are related to and can identify thought/ psychotic disorders. and type of population.29. the ―Logical Rorschach‖ developed by Wagner (2001.232 using the Aggressive Potential index. Schizotypal personality disorder. have poor form quality (they do not fit the shape of the inkblots).‖ Wood et al.‖ (Wood et al. 2010. 423). education. and other confounding factors that were not controlled. Smith et al. However. in an attempt to separate psychopaths from non-psychopaths.40 to . how detailed and creative response is . p. verbal aptitude. and Lundback (2007) in which they found no agreement between the Rorschach and a self-assessed personality using the MMPI-2. Weiner (2001). as cited in Smith et al. for example. but later studies found scores as low as 0. 391) has pointed out that results of validity studies on the Rorschach have been mixed. (2010) evaluated the validity of the Rorschach in assessing the effects of trauma using a different system.

posture. parents or other authority figures. As people taking the TAT proceed through the various story cards and tell stories about the pictures. it asks the subject to project his or her habitual patterns of thought and emotional responses onto the pictures on the cards— many psychologists prefer not to call it a "test. and emotional responses to ambiguous test materials. what has led up to it. Purpose Individual assessments The TAT is often administered to individuals as part of a battery. of tests intended to evaluate personality. observational capacity. vocal tone. based on responses that involve eating. Because the TAT is an example of a projective instrument— that is." because it implies that there are "right" and "wrong" answers to the questions. is a projective measure intended to evaluate a person's patterns of thought. The subject is asked to tell the examiner a story about each card that includes the following elements: the event shown in the picture. subordinates. and other signs of an emotional response to a particular story picture. and the outcome of the event.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516   Elizur Anxiety and Hostility scales (based on the emotional content of patients‘ responses). they reveal their expectations of relationships with peers. or other oral imagery. It is considered to be effective in eliciting information about a person's view of the world and his or her attitudes toward the self and others.have a well-demonstrated relationship to anxious and hostile behaviours The Rorschach Oral Dependency scale (ROD). attitudes. They consider the term "technique" to be a more accurate description of the TAT and other projective assessments. the examiner evaluates the subject's manner. mouths. hesitations. the ambiguous materials consist of a set of cards that portray human figures in a variety of settings and situations. or group. In the case of the TAT. or TAT. In addition to assessing the content of the stories that the subject is telling. appears to be a valid measure of normal variations in dependency Thematic Apperception Test The Thematic Apperception Test. . and possible romantic partners. what the characters in the picture are feeling and thinking.

most often needs for achievement. a person who is made anxious by a certain picture may make comments about the artistic style of the picture. For example. military leadership positions. The results indicated that his attitudes toward other people are not only outside normal limits but are similar to those of other persons found guilty of the same type of crime. education. in order to . fears of failure. the TAT was recently administered to a 24-year-old man in prison for a series of sexual murders. the TAT is sometimes used for forensic purposes in evaluating the motivations and general attitudes of persons accused of violent crimes. etc. their ability to distinguish between their viewpoint on a situation and the perspectives of others involved. Research into object relations using the TAT investigates a variety of different topics. The TAT can be given repeatedly to an individual as a way of measuring progress in psychotherapy or. The TAT is often used in individual assessments of candidates for employment in fields requiring a high degree of skill in dealing with other people and/or ability to cope with high levels of psychological stress— such as law enforcement. Lastly. including the extent to which people are emotionally involved in relationships with others. religious ministry. and issues of personal identity. self-esteem issues.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 For example. their ability to control aggressive impulses. in some cases. their ability to understand the complexities of human relationships. this is a way of avoiding telling a story about it. "Object relations" is a phrase used in psychiatry and psychology to refer to the ways people internalize their relationships with others and the emotional tone of their relationships. or remark that he or she does not like the picture. the TAT is frequently used for research into specific aspects of human personality. to help the therapist understand why the treatment seems to be stalled or blocked. Research In addition to its application in individual assessments. one recent study compared responses to the TAT from a group of psychiatric inpatients diagnosed with dissociative disorders with responses from a group of non-dissociative inpatients. hostility and aggression. and interpersonal object relations. For example. Although the TAT should not be used in the differential diagnosis of mental disorders. diplomatic service. it is often administered to individuals who have already received a diagnosis in order to match them with the type of psychotherapy best suited to their personalities.

the card labeled 6GF shows a younger woman who is seated turning toward a somewhat older man who is standing behind her and smoking a pipe. psychology. Cultural. gender. or whether it may be a normal response from a person in a particular group. For example. frequent references to death or grief in the stories would not be particularly surprising from a subject who had recently been bereaved. in order to have some context for evaluating what might otherwise appear to be abnormal or unusual responses. and class issues must be taken into account when determining whether a specific response to a story card is "abnormal" strictly speaking. In general. or other fields who are learning to administer and interpret the TAT receive detailed instructions about the number of factors that can influence a person's responses to the story cards. . the TAT should not be used as the sole examination in evaluating an individual. but most female subjects regard it as a very aggressive picture. the 1992 Code of Ethics of the American Psychological Association requires examiners to be knowledgeable about cultural and social differences. Many researchers consider the gender difference in responses to this card as a reflection of the general imbalance in power between men and women in the larger society.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 investigate some of the controversies about dissociative identity disorder (formerly called multiple personality disorder). with unpleasant overtones of intrusiveness and danger. In addition. gender. Experts in the use of the TAT recommend obtaining a personal and medical history from the subject before giving the TAT. it should be combined with other interviews and tests. and to err "on the side of health" rather than of psychopathology when evaluating a subject's responses. Precautions Students in medicine. and class issues The large number of research studies that have used the TAT have indicated that cultural. and to be responsible in interpreting test results with regard to these differences. In addition. Most male subjects do not react to this picture as implying aggressiveness. they are advised to be conservative in their interpretations. For example.

Multiplicity of scoring systems One precaution required in general assessment of the TAT is the absence of a normative scoring system for responses. Description .NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Race is another issue related to the TAT story cards. While these systems are more practical for clinical use. The basic drawback of any scoring system in evaluating responses to the TAT story cards is that information that is not relevant to that particular system is simply lost. A computerized system for interpreting the Rorschach was devised as early as 1964. and as a result has been little used by later interpreters. one of the authors of the TAT. all involved Caucasian figures. As of 2002. they lack comprehensiveness. Murray's scoring system is time-consuming and unwieldy. As early as 1949. Second. Newer sets of TAT story cards have introduced figures representing a wider variety of races and ethnic groups. it is not clear whether a subject's ability to identify with the race of the figures in the story cards improves the results of a TAT assessment. hostility or depression. Computers have two basic limitations for use with the TAT: the first is that they cannot observe and record the subject's vocal tone. The original scoring system devised in 1943 by Henry Murray. there are no computerized systems for evaluating responses to the TAT. Other scoring systems have since been introduced that focus on one or two specific variables—for example. attempted to account for every variable that it measures. eye contact. As of 2002. computers are not adequate for the interpretation of unusual subject profiles. researchers who were administering the TAT to African Americans asked whether the race of the figures in the cards would influence the subjects' responses. The original story cards. While computers were used initially only to score tests with simple yes/no answers. Computer scoring A recent subject of controversy in TAT interpretation concerns the use of computers to evaluate responses. which were created in 1935. and other aspects of behavior that a human examiner can note. No single system presently used for scoring the TAT has achieved widespread acceptance. however. however. users of the TAT should be aware of the controversies in this field. they were soon applied to interpretation of projective measures.

The subject sits at the edge of a table or desk next to the examiner. The examiner shows the subject a series of story cards taken from the full set of 31 TAT cards. The usual number of cards shown to the subject is between 10 and 14. Administration The TAT is usually administered to individuals in a quiet room free from interruptions or distractions. The student wondered whether similar pictures could be used in therapy to tap into the nature of a patient's fantasies.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 The TAT is one of the oldest projective measures in continuous use. the developments that led up to the event. The original 31 cards were divided into three categories. and the outcome of the story. or for use with subjects of either sex. that the basic idea behind the TAT came from one of Murray's undergraduate students. The examiner keeps the cards in a pile face down in front of him or . The early versions of the TAT listed Morgan as the first author. Christiana Morgan. The student mentioned that her son had spent his time recuperating from an illness by cutting pictures out of magazines and making up stories about them. with specific instructions to include a description of the event in the picture. One of the controversies surrounding the history of the TAT concerns the long and conflict-ridden extramarital relationship between Morgan and Murray. and its reinforcement of the prejudices that existed in the 1930s against women in academic psychology and psychiatry. It is generally agreed. and their colleagues at the Harvard Psychological Clinic. Recent practice has moved away from the use of separate sets of cards for men and women. The subject is then instructed to tell a story about the picture on each card. History of the TAT The TAT was first developed in 1935 by Henry Murray. although Murray recommended the use of 20 cards. the thoughts and feelings of the people in the picture. with women only. for use with men only. but later versions dropped her name. It has become the most popular projective technique among English-speaking psychiatrists and psychologists. however. and is better accepted among clinicians than the Rorschach. administered in two separate one-hour sessions with the subject.

and asks the subject to place each card face down as its story is completed. and view of the outside world. and . etc. stammering. or educational level groups and then measuring a given subject's responses against those norms. gives them to the subject one at a time. gender. The story structure typically reflects the subject's feelings. sex. Nomothetic interpretation refers to the practice of establishing norms for answers from subjects in specific age. In interpreting responses to the TAT. fidgeting in the chair. including his or her nonverbal behaviors. fantasies. racial or ethnic identification. the feeling or tone of the stories.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 her. Recording Murray's original practice was to take notes by hand on the subject's responses. called nomothetic and idiographic respectively. or the subject's behaviours apart from responses. age. Research has indicated. Most psychologists would classify the TAT as better suited to idiographic than nomothetic interpretation. first language. assumptions about the world. inner conflicts. Idiographic interpretation refers to evaluating the unique features of the subject's view of the world and relationships. Another option involves asking the subject to write down his or her answers. Administration of the TAT usually takes about an hour. however. level of education. These behaviours may include verbal remarks (for example. that a great deal of significant material is lost when notes are recorded in this way. wishes. some examiners now use a tape recorder to record subjects' answers. As a result. difficulties making eye contact with the examiner. comments about feeling stressed by the situation or not being a good storyteller) as well as nonverbal actions or signs (blushing. racial. occupation. Interpretation There are two basic approaches to interpreting responses to the TAT.) The story content usually reveals the subject's attitudes. Results The results of the TAT must be interpreted in the context of the subject's personal history. examiners typically focus their attention on one of three areas: the content of the stories that the subject tells. and an underlying attitude of optimism or pessimism.

As in other projective devices.  Especially useful in psychotherapy. Forty stems are completed by the subject. desires. where discussions can be made about the theme of certain stories the client gives that might not have been within the client‘s current awareness. These completions are then scored by comparing them against typical items in empirically derived scoring manuals for men and women and by assigning to each response a scale value from 0to6. fears and attitudes in the sentences he makes. Especially useful for children who can utilize pictures to tell a story about their emotions/internal conflicts as they can have difficultly expressing themselves directly with words.  Subject cannot figure out how their response will be interpreted so it is difficult to fake a response Rotter Incomplete Sentences Blank Purpose Of Risb The Rotters incomplete sentence blank is an attempt to standardize the sentence completion method for the use at college level. The total score is an index of maladjustment The Sentence Completion Method The sentence completion method of studying personality is a semi structured projective technique in which the subject is asked to finish a sentence for which the first word or words are supplied. it is assumed that the subject reflects his own wishes.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 other characteristics that may be important. every person who participated in the study injected aspects of their personalities into their stories. Advantages of Thematic Apperception Test (TAT)   Murray stated that without exception. . "Normal" results are difficult to define in a complex multicultural society like the contemporary United States.

In some test incomplete sentences tests only a single word or brief response is called for. a feature of ISB is that one can derive a single over-all adjustment score. two objectives were kept in mind. and Holzberg at the Mason General Hospital. the major differences appears to be in the length of the stimulus. One aim was to provide a technique which could be used objectively for screening and experimental purposes. This over-all adjustment score is of particular value for screening purposes with college students and in experimental studies. Hutt . A second goal was to obtain information of rather specific diagnostic value for treatment purposes. The Incomplete Sentence Blank can be used. The ISB has also been used in a vocational guidance center to select students requiring broader counseling than was usually given. Psychometric Properties 1. of course. in turn. However. and also be economical from the point of view of administration and scoring. tendencies to block and to twist the meaning of the stimulus words appear and the responses may be categorized in a somewhat similar fashion to the word association method.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Historically. Development Of Isb The Incomplete Sentence Blank consists of forty items revised from a form used by Rotter and Willermann in the army. in experimental studies of the effect of psychotherapy and in investigations of the relationship of adjustment to a variety of variables. the incomplete sentence method is related most closely to the word association test. In the development of the ISB. for general interpretation with a variety of subjects in much the same manner that a clinician trained in dynamic psychology uses any projective material. a revision of blanks used by Shor. It was felt that this technique should have at least some of the advantages of projective methods. This form was. In the sentence completion tests. RELIABILITY .

A cutting score of 135 provided a very sufficient separation of adjusted and maladjusted students in the data collected above 3.11. Therefore items on the ISB were divided into two halves deemed as nearly equivalent as possible. The agreement between corresponding first and third quartile points was very close. State University. It was interesting to find that the correlation coefficient between the Ohio State Psychological Examination scores and ISB scores for the selected freshman sample was only ..84 when based on the records of 124 male college students.e. Scoring of the blanks was done ―blindly‖ the scorer never knew whether the test blank was supposed to be that of a maladjusted or an adjusted subject. as needing personal counselling or as not needing such counselling. The subjects include 82 females and 124 males who were classified as either adjusted or maladjusted i.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Since the items on an incomplete sentence blank are not equivalent. NORMS A distribution of scores on the ISB for a representative college freshman population was obtained by giving the Incomplete Sentences Blank to 299 entering freshman at Ohio.96 for female records. 2.83 when based on 71 female students. Validity data were obtained for the two sexes separately since the scoring manuals differ. This is in accord with a general feeling that a very little relationship would exist between intelligence and scores on the personality measure such as the Incomplete Sentence Blank . the odd even technique for determining reliability is not applicable and would tend to give minimum estimate of internal consistency. A comparison between the median percentile ranks on the Ohio State Psychological Examination of the sample and of the total freshman population showed a difference of approximately two percentile points. and .91 when based on male records and . Inter-scorer reliability for two scorer trained by the authors was . This yielded a corrected split-half reliability of . VALIDITY The Incomplete Sentence Blank was validated on groups of subjects which did not include any of cases used in developing the scoring principles and the scoring manuals.

The correct scoring for these records is given at the end.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Scoring THE USE OF SCORING MANUAL Sentence completions are used from examples in the scoring manuals by assigning a numerical weight from 0 to 6 for each sentence and totaling the weights to obtain the overall score. These principles are intended to aid in determine the correct weight for a completion when a very similar statement cannot be found in the scoring examples. Sentence completion is used for illustrative purposes in the following discussion are taken almost entirely from the manual. if there are more than 20 omissions. SCORING PRINCIPLES OMISSION RESPONSE Omission responses are designated as those for which no answer is given or for which the thought is incomplete. no scoring is made. For all responses which are subsumed under the heading of incomplete thoughts or omissions. . Omissions and fragments are not scored. The scoring examples in part II of this manual are given to facilitate the assignment of weights to responses. They are from ISB responses of 58 male and 53 female college students. a set of scoring principles will be presented. Since the scoring examples are illustrative and representative of common responses with no intent to list all possible sentence completions. After the remainder of responses is prorated by the formula {40 / (40-omissions)} times the total scores however. It is recognized that in a clinical situations are occasionally provocative since they may point to areas which the individual does not recognize or cannot bring himself to express. In order to provide the potential user of the ISB with ―supervised‖ experience before attempting to score clinical or experimental records. the paper is considered unscorable for all practical purposes. ranging from extremely well adjusted person to those judged to be in need of psychotherapy. They may also use by a clinic supervisor to check the scoring ability of any student or general scorer. These examples will enable the clinician to check his scoring against that of the authors.

lack of goals. and so forth. expression of rather bizarre attitudes. generalized school problems. don’t appeal to me except sexually because”. psychosomatic complaints. . The numerical weights for the conflict responses are C1=4 C2=5 C3=6 Typical of the C1 category are responses in which concern is expressed regarding such things as the world state of affairs. to know if I am crazy”. More serious indications of maladjustment are found in the C2 category. sexual conflicts. and so on. . symptom elicitation. the thought of going home since” CONFLICT RESPONSES ―C‖ or conflict. financial problems. .NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 For example. more generalized difficulties than are found in C1. “I like . This type of response will lie in C3 category. Responses range from C1 to C3 according to the severity of the conflict or maladjusted expressed. feelings of confusion. physical complaints. . this type of response will lie in C1 category . feeling of inadequacy. hopelessness and suicidal wishes. Among the difficulties found in this area are suicidal wishes. “Most girls . fear of insanity. concern over possible failure. . . “I want to know . pessimism. . These include hostility reactions. and difficulty in heterosexual relationships as well as generalized social difficulty. . specific school difficulties. responses are those indicating an un healthy or maladjusted frame of mind. I Included here are expressions of inferiority feelings. strong negative attitudes toward people in general. and indications of past maladjustment. identification with minority groups. Expression of severe conflict or indications of maladjustments are rated C3. . In general it might be said that subsumed under C1 are minor problems which are not deep-seated or incapacitating. statements of unhappy experiences. For example. is over” and this type of response will lie in C2 category. concern over vocational choice. On the whole the responses refer to broader. and more or less specific difficulties. “The happiest time . or “I hate . . about life”. severe family problems.

optimistic responses. Responses range from P1to P3 depending on the degree of good adjustment expressed in the statement. They are generally on a simple descriptive level. healthy family life. . These are evidence by humorous or flippant remarks. . . to have good time”. For example. and warm acceptance are types of responses which are subsumed under the P3 group. The ISB deviates from the majority of the test in that it scores humorous responses. “Back home . Clear cut good natured humour. is yet to come”. “I like . and acceptance reactions. The other group is composed of many responses which are found as often among maladjusted as among adjusted individual and through clinical judgment could not be legitimately place in either C or P group. . are many friends”. real optimism. NEUTRAL RESPONSES ―N‖ or neutral responses are those not falling clearly into either of the above categories. . Two general types of responses which account for a large share of those that fall in the neutral category.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 POSITIVE RESPONSES ―P‖ or positive responses are those indicating a healthy or hopeful frame of mind. The numerical weights for the positive responses are P1= 2 P2=1 P3=0 In the P1 class common responses are those which deal with positive attitudes toward school. Generally found under the heading of P2 are those replies which indicate a generalized positive feeling toward people. . this type of response will lie in P2 category. good social adjustment. expression of warm feeling toward some individual and so on. optimism and humour. this type of response will lie in P1 category. expression interest in people. hobbies. sports. One group includes those lacking emotional tone or personal reference. . this type of response will lie in P3 category. “The happiest time . All the N responses are scored 3.

“I wish . .NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 For example. therefore. . of course. “the only trouble . . there is a tendency to rate all responses in light of the over-all picture. “I secretly . was a suicide”. Another instance is. . . For example. Independent Scoring Of Items Each response is to be scored and evaluated independently of all others. Qualification Responses which start like an example in the manual but are differently qualified are scored with a consideration of these qualifications. is I wish I could forget I’ll be like my father”. except when it is clear-cut reference to a previous statement. . . . . but it’s too close to home”. ―The future . Common among these are responses given by individuals subsequent to therapy.. blame my mother”. There are also responses which will be given lower ratings than they would get without the qualification. Or ―Back home . For example.k. “Sports . Exteme Wieghts . which refers to a precedent. is uncertain. In such an instance. . These types of responses will lie in neutral responses. “Most girls . he were dead” in one record had reference to the preceding sentence when the individual said. life was pretty miserable. . . important in the scoring of any papers to avoid the halo effect as much as possible so that the measurement can be reliable. Such qualifications may change the weighting of the response by one or more points. This is equally necessary here for. “My father . . . and it would not be reasonable to score it independently of the first. . yet they don’t hold my interest like they did”. Or “This school . . but I think I can cope with the situation now‖. It is. a previous response must be used in the evaluation of the later one. . but I think I can lick it‖. . I have always liked. . . In some cases a response refers directly to a previous item. if each response is not scored independently of all others. are females” or “When I was child . I spoke as a child”. is o. it may be seen that the following responses should be scored higher than if they had not been qualified.

is one thing I hate‖. . . Although the subject made aware of general intend. These weights may be assigned. although there are no examples listed for these items. . A well-adjusted individual wrote simply. Unusually Long Responses In cases where the response is unusually long. then it is permissible to use an extreme weight. the subject is not forced to answer yes or no or? to the examiner question. If the following responses were given they would be scored 6. The only exception to this rule concerns neutral completions. in any way he desires. Advantages of RISB    The general advantages of the sentence completion method can be summarized as follows There is freedom of response. . “I like . On the other hand well-adjusted person frequently replies to the stimuli with short concise statements. . then it is permissible to use an extreme weight. He may instead. if clearly warranted. . “I am best when . . . the maladjusted students wrote. The previous responses were from two superior intelligence. however. It has been found that the maladjusted individual often writes long involved sentences as if compelled to express himself fully and not misunderstood. people”. . I am under no pressure of responsibility concerning the accomplishment of a given thing within a certain specified time”. This does not seem to be a function of intelligence as might be hypothesized. what constitutes a good or bad answer is not readily apparent to most subjects. The following are reactions of two individuals of lesser ability. Some disguise in the purpose of the test is present. “I like . it is always scored as neutral.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 In cases when a response seems to be more extreme than the examples cited. should not be allowed for mixed groups because they are too stimulation‖. For example. one poorly adjusted individual wrote. . . Or ―Reading . agriculture”. . “I am best when . it should be given an additional point in the direction of ―C‖ unless it has already been rated 6. ―Sports . I’m having a party”. . If the response is a common quotation. stereotype or song title. That is. An adjusted person wrote. regardless of length. In cases when a response seems to be more extreme than the examples cited.

it cannot be machine scored and requires general skill and knowledge of personality analysis for clinical appraisal and interpretation. wrote that the test is "routinely administered as an indicator of schizophrenia.  There is not as much disguise of purpose as in other projective methods.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516   Group administration is relatively efficient. Interpretation depends on the examiner‘s general clinical experience. Specific signs could include a patient's neglect to include "obvious anatomical parts like hands and eyes. in his 1976 book The Origin of Consciousness in the Breakdown of the Bicameral Mind. Application of the method as a group test also requires writing and language skills and has not yet been adequately evaluated for potential clinical usefulness for younger children. Dr." with "blurred and unconnected lines. a sophisticated subject may be able to keep the examiner from knowing what he does not wish to reveal. when they do. this test was first known as the Goodenough Draw-A-Man test. Most incomplete sentences tests can be given to a group of any size without apparent loss of validity. Psychologist Julian Jaynes. disturbed or uncooperative subjects. applied and experimental purposes Disadvantages of RISB  Although susceptible to semi-objective scoring. particularly from illiterate. It is detailed in her book titled Measurement of Intelligence by Drawings. Draw A Person Test Introduction Developed originally by Florence Goodenough in 1926. No special training is ordinarily necessary for administration. Consequently. The revision and extension is detailed in his book Children's Drawings as Measures of Intellectual Maturity (1963). Dale B." and that while not all schizophrenic patients have trouble drawing a person. Harris later revised and extended the test and it is now known as the Goodenough-Harris Drawing Test. although the examiner does not need specific training in the use of this method.  The method is extremely flexible in that new sentence beginnings can be constructed or tailor made for a variety of clinical." ambiguous . it is very clear evidence of a disorder.  Insufficient material is obtained in some cases.

Chapman and Chapman (1969). in 1926 (Scott. History The official beginning of when figure drawing was first thought to be associated with personality is unknown. 1981).g. a painting by a great artist. psychologists started considering the test for measures of differences in personality as well as intelligence. However. Now considered the Goodenough-Harris Test it has guidelines for assessing children from ages 6 to 17 (Scott. 2008). Goodenough first became interested in figure drawing when she wanted to find a way to supplement the Stanford-Binet intelligence test with a nonverbal measure. Karen Machover developed the first measure of figure drawing as a personality assessment with the Draw A Person Test (Machover. concerns. showed that the scoring manual. The test was developed to assess maturity in young people. There has been no validation of this test as indicative of schizophrenia. the formal beginning of it‘s use for psychological assessment is known to begin with Florence Goodenough. or a doodle made by an average person. Machover did a lot of work with disturbed adolescents and adults and used the test to assess people of all ages. In 1949. Harris later revised the test including drawings of a woman and of themselves. and personality traits.. 1981). In her test. She concluded that the amount of detail involved in a child‘s drawing could be used as an effective tool. She wrote a book on her measure expressing that the features of the figures drawn reflect underlying attitudes. Over the years. she included a suggestion to ask about the person they have drawn. a child psychologist. Soon after the development of the test. 1949). e. the test has been revised many times with added measures for assessing intelligence (Weiner & Greene. the curiosity somehow came about. in a classic study of illusory correlation. Whether it was the drawing on a cave wall.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 sexuality and general distortion. large eyes as indicative of paranoia. This led to the development of the first official assessment using figure drawing with her development of the Draw-A-Man test. could be generated from the naive beliefs of undergraduates. Machover used a qualitative approach in her interpretation considering individual drawing . She advises to ask them to tell the administrator a story about the figure as if they were in a novel or play.

what are the strongest parts of the tree. the House-Tree-Person test similarly just asks the person to draw those three objects and then inquires about what they have drawn. Koppitz developed a measure of assessment that has a list of emotional indicators including size of figures. however. No further instructions are given and the child is free to make the drawing in whichever way he/she would like. The most popular quantitative approach was developed by Elizabeth Koppitz. 1965). 1949).NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 characteristics (Machover. Harris's book (1963) provides scoring scales which are used to examine and score the child's drawings. . The questions asked for inquiry include what kinds of activities go on in the house. Nature Of The Test Test administration involves the administrator requesting children to complete three individual drawings on separate pieces of paper. and what things make the person angry or sad. 1965). which is part of its appeal. The test has no time limit. children rarely take longer than about 10 or 15 minutes to complete all three drawings.2008). The test is completely noninvasive and non-threatening to children.e. a number of other tests have developed using figure drawing as a personality assessment tool. and some ―special features‖. although the child must make a drawing of a whole person each time — i. not just the face. There is no right or wrong type of drawing. The KFD (Kinetic Family Drawing) tells the drawer to draw their family doing something (Murstein. head to feet. but also the thematic variables involved. Others have since suggested a more quantitative approach that can be more widely used analyzing selected characteristics that are in an index of deeper meanings (Murstein. a woman. With the Draw a Person test as a base. Every figure drawing test asks the drawer to include some kind of description or interpretation of what is happening in the picture. All of these tests have the important element of not only the assessment of the pictures themselves. 1965). Children are asked to draw a man. For example. and themselves. The total number of the indicators is simply added up to provide a number that represents the likeliness of disturbance (Murstein. omission of body parts. These elements are also analyzed accordingly (Weiner & Greene.

verbal skills. but may have some problems during inquiry Little research backing . The use of a nonverbal. and sensitivity to working under pressure. a factor that may account for the tendency of middle-class children to score higher on this test than lower-class children. the administrator uses the Draw-a-Person: SPED (Screening Procedure for Emotional Disturbance) to score the drawings. there are 64 scoring items for each drawing. A separate standard score is recorded for each drawing. eight dimensions of each drawing are evaluated against norms for the child's age group. who often have fewer opportunities to draw. For the second type. and a total score for all three. including presence or absence. For the first type. Any other uses of the test are merely projective and are not endorsed by the first creator. and proportion. This system analyzes fourteen different aspects of the drawings (such as specific body parts and clothing) for various criteria. detail. the test administrator uses the Draw-a-Person: QSS (Quantitative Scoring System). test results can be influenced by previous drawing experience.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 To evaluate intelligence. However. To assess the test-taker for emotional problems. The purpose of the test is to assist professionals in inferring children's cognitive developmental levels with little or no influence of other factors such as language barriers or special needs. In all. Advantages of Draw A Person Test:      Easy to administer (only about 20-30 minutes plus 10 minutes of inquiry) Helps people who have anxieties taking tests (no strict format) Can assess people with communication problems Relatively culture free Allow for self-administration Disadvantages of Draw A Person Test:    Restricted amount of hypotheses can be developed Relatively non-verbal. communication disabilities. This system is composed of two types of criteria. nonthreatening task to evaluate intelligence is intended to eliminate possible sources of bias by reducing variables like primary language. 47 different items are considered for each drawing.

) "Projective personality testing: Psychological testing.1037/h0032332 Projective Tests. P.d. (1972). Illinois: Charles C Thomas Publishers. Westport. P. (2000).merriamwebster. ISBN 0-313-32457-3. The journal of nervous and mental disease.sju. Szondi (1960) Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik. Psychological Bulletin. (n. From the Spanish translation. 2012. 77(3). Huber. from http://www. S. 126(1). 2nd edition. Huber.com/dictionary/word-association%20test Spiteri. from Staint Joseph's University: Department of Psychology Web site: http://schatz.) Retrieved November 21. (n. Poizner. K. Harcourt College Publishers.htm . E. 2012. Gamble.396 Shatz. 201–204.pdf Piotrowski. 172194.d.edu. S. p. "Word association testing and thesaurus construction.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 REFERENCE Projective Methods for Personality Assessment. Leopold Szondi (1960) Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik. Retrieved November 21.396.ualberta. Clinical Graphology: An Interpretive Manual for Mental Health Practitioners. Springfield.htm Schultz. Retrieved November 21. 2nd edition. The Tomkins-Horn Picture Arrangement Test.. Bern und Stuttgart.au/libres14n2/Spiteri_final.d. The holtzman inkblot technique. Annette (2012). (n. Phillip." Seventh edition.edu/intro/1001lowfi/personality/projectiveppt/sld001. Popular psychology: an encyclopedia. Ch.). Luis A.ca/~chrisw/L12ProjectiveTests/L12ProjectiveTests.htm. from http://www." Retrieved November 21. & Schultz. (n. doi:10.27. (2005).neiu.).1097/00005053-195801000-00016 Merriam-Webster. Ch. School of Library and Information Studies website: http://libres. From the Spanish translation. Bern und Stuttgart. Z." Retrieved November 21.27. 2012 from http://web. 2012. doi:10. pp. 106. (n. D. R.d.curtin.d. (1958-01-01).2012.edu/~mecondon/proj-lec. B)II Las condiciones estadisticas. B)II Las condiciones estadisticas. from Dalhousie University. "The history of modern psychology. Conn: Greenwood Press.psych. p. Cordón.).

87: 3. and Kurtz. 223 — 225 . Gregory J. Journal of Personality Assessment.(2006) 'Advancing Personality Assessment Terminology: Time to Retire "Objective" and "Projective" As Personality Test Descriptors'.NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516 Meyer. John E.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->