Course Description

Rasch M d l R h Model for Research/Educational Assessment

This training is concern with the improvement of the measurement process in educational assessment using Rasch Model. The assessments in education involve persons responding to a set of items for assessment. This training is aimed to determine whether the responses to a set of assessment fit the model.

Course Outcomes
Upon the completion of this course, participant would be able to: Understand the concept of psychometrics and Rasch measurement model Develop a reliable set of test/ questionnaire for measurement Analyze the data/ responses using Bond&Fox Steps® software.

Target Participants

This course is designed for: Lecturers Researchers Post Graduate Students Training Authorities (Educational Admin Staffs)

Course Content
• Intro • Key Concepts

Key Concept of Measurement

Developing  Instruments
• Steps • Hands‐on

Rating Scale Model
• Dichotomous  • Polytomous

• Bond & Fox • Hands On



Key Concept of Measurement

Key Concept of Measurement
Measurement? Right Measure? Right Scale? Right Instrument?

Physical Measure? Psychological Measure? Psychometric Measure?

Key Concept of Measurement
Global Accepted Measure Length :: meter Weight :: gram Time :: second/ minute/ hour Temperature :: celsius Electricity :: ampere INSTRUMENT?

Think Wisely!

Think Wisely!

How to Measure?


4/23/2012 How to Measure? Secara logiknya… Atribut kebijaksanaan yang paling  sukar dimiliki oleh manusia.……………………………………………... Apa Tujuan Ujian? Apakah Contoh Bentuk Ujian? Apakah Aspek Yang Kita Uji? ::0:: Skema Pengajaran ::0:: Objektif Pengajaran ::0:: Rujukan Pengajaran ::0:: Tugasan Pengajaran ::0:: Kandungan Pengajaran Why Rasch in OBE? OUTCOME ASSESSMENT Examples: E l 1. Placement Test 5.. Educational Measurement Apakah yang anda faham. EE Survey 2. Attitude/ Insaniah Skills 3 . dannnnn! Manusia yang tidak memiliki atribut kebijaksanaan yang semua manusia gg p lain miliki boleh dianggap bodoh…begitu juga manusia yang  tidak memiliki atribut kecantikan yang semua manusia lain miliki boleh dianggap hodoh. hanya dimiliki oleh manusia yang paling cantik! Secara logiknya juga… Manusia yang memiliki atribut yang  semua manusia miliki tidak dianggap j / j g gg p bijak/ cantik dan tidak juga dianggap bodoh/ hodoh. ::Penilaian::EVALUATION. mungkin y y g hanya dimiliki oleh manusia yang  paling bijak…begitu juga atribut kecantikan yang paling sukar dimiliki oleh manusia. ::Pengukuran::MEASUREMENT…………………………………… ::Pengujian::TESTING. Training Effectiveness 4. Test/ Examination 3..………………………………………….

Validity Reliability Objectivity Utility Readability Test Instrument Construct Right Test Right Measure Contoh: 1 : SANGAT TIDAK SETUJU 2 : TIDAK SETUJU 3 : TIDAK PASTI 4 : SETUJU 5 : SANGAT SETUJU Right Test Right Measure 5/10 TIDAK SAMA 5:5 1 99 10 90 20 80 30 70 40 60 50 50 60 40 70 30 80 20 90 10 99 1 Probabilistic Ratio-based Ruler mean = 4.325 maksudnya? Right Test Right Measure Bagi mengetahui responden yang  mendapat skor tertinggi.  Jika dibuat garisan merentas (garis putus)  terdapat banyak skor 1  di atas garisan berbanding di bawah garisan.  Skor 0  oleh pelajar P  di atas garisan adalah kerana kecuaian pelajar manakala skor 1  oleh pelajar S  di bawah garisan adalah tekaan jawapan. 4 . Perhatikan nilai 1merujuk  kepada jawapan yang  betul berada sebelah kiri.4/23/2012 Test Instrument Construct Questions Must Be Simple Clear and Directive Avoid Multi-Defined Questions Various Levels Parallel With Teaching Contents Should Consider. g j p Indices 10‐2 Right Test Right Measure 10 0 10 1 10 2 10 ‐1 Log Odds Unit Ruler (LOGIT) Logit adalah unit ukuran yang dikira selepas data mentah (angka) ditukar kepada bentuk nisbah (ratiobased) yang lebih tepat untuk mengukur abiliti. Inilah idea asas RM.  RM  memastikan item  disusun mengikut yang  mudah kepada yang  sukar.  item  mestilah diketahui tahap kesukarannya.  Pelajar P  paling  berupaya dan pelajar S  paling  tidak berupaya.

45 (menggunakan 5 skel Likert)…siapakah yang benarbenar bersetuju? atau berbohong? atau tersilap bersetuju? CRITERION-BASED versus NORM-BASED EVALUATION Right Test Right Measure Scalogram of Responses/ Anwers Right Test Right Measure Right Test Right Measure PRINSIP MODEL PROBABILISTIK “a person having a greater ability than another person should have the greater probability of solving any item of the type in question. 25. 24. 45. 48 10. 15. 53 4. 55. 38. 17.. Ah Chong and Ramasamy yang mendapat markah 79%. 37. 12.. 14. 20. 30. 41. 52. 5 . one item being more difficult than another means th t f diffi lt th th that for any person th the probability of solving the second item is the greater one”. 1 (Rasch. 31. 5 15 25 35 44 55 56 57 6. 44. 21. 35. 46 8. 22. 50 2. 18. 56. 54 5. 27.4/23/2012 Right Test Right Measure KES Antara Ali. 26. 28. 47 9. with 60 items to be measured on student’s environmental attitude effects. 51. 40. 11. 3. 13. 23. 58. and similarly. 16. 42. 32. 39. 1960) Abiliti Responden Kesukaran Item PROBABILITI SUKSES CARELESS LUCKY GUESS Scalogram of Responses/ Anwers Right Test Right Measure sesebuah model pengukuran saintifik mestilah… Right Test Right Measure USES LINEAR MEASURE OVERCOMES MISSING DATA GIVES ESTIMATES OF PRECISION DETECTS MISFITS OR OUTLIERS PROVIDES RELIABILITY VALUE Dimensions Attitude Towards Energy Conservation (EC) Attitude Towards Mobility and Transportation (MT) Attitude Towards Waste Avoidance (WA) Attitude Towards Recycling (R) Attitude Towards Consumerism (C) Attitude Towards Environmental Conservation (VB) Attitude Towards Flora and Fauna (EFF) Attitude Towards Water and Air (EWA) Attitude Towards Human Being (EHB) Attitude Towards Metaphysical Entities (EME) Number of Items 1. 29. there are ten main dimensions identified. 43. 36. 33. 34. 19. 59. 60 7. 49 After review analysis.siapakah yang lebih bagus? Antara 2 responden yang mean persetujuannya 3.

  30% 40% Do face validity  and  a pilot  test.  This  is  where  Rasch  becomes  very  handy  in  handling small size and missing data. A typical question is the following: Marital status: 1) Single (never married) 4) Divorced 2) Married 5) Separated 3) Widowed Tips for Instrument Development Avoid scale point proliferation. The use of neutral response positions had a basis in the past when crude computational methods were unable to cope with missing data. you will collect information which will yield more satisfactory and meaningful results. 1) Never 2) Seldom 3) Occasionally 4) Frequently Do ask responders to rate both positive and negative stimuli. Responders cannot be reasonably expected to rank more than about six things at a time. Psychometric research has shown that most subjects cannot reliably distinguish more than six or seven levels of responses. Some questionnaires give the impression  that  their  authors  tried  to  think  of  every  conceivable  question  that  might  be  asked  with  respect  to  the  general  topic  of  concern.  Do choose  appropriate  response  category  language  and  logic.  The  extent  to  which  responders agree with a statement can be assessed adequately in many cases by the  dichotomous options: 1) Disagree 2) Agree  30% Tips for Instrument Development Do order categories.  Do the  best  endeavor  to  write  as  few  questions  as  possible  to  obtain  it.  it certainly  helps. Rasch analysis has shown the sensitivity of a measurement is not loss despite a smaller rating is used. the following options would be undesirable in most cases: 1) Strongly disagree 2) Disagree 3) Agree 4) Strongly Agree Some would say that "Strongly agree" is redundant or at best a colloquialism. ranking questions may be framed as follows: Following are three colors for office walls: 1) Beige 2) Ivory 3) Light green Which color do you like best? _____ Which color do you like second best? _____ Which color do you like least? _____ By carefully evaluating the needs of every question used in an instrument and carefully wording the responses.  Get feedback  on  your initial  list  of  questions. for example. Offering four to five scale points is usually quite sufficient to stimulate a reasonably reliable indication of response direction. it is usually better to list a polytomous order from the lower level to the higher in left‐to‐right order. Tips for Instrument Development Avoid open-ended questions.4/23/2012 Right Test Right Measure Dimension D1/ CO1/ CH1 D2/ CO2/ CH2 D3/ CO3/ CH3 Item/ Attribute D1/CO1/CH1 A1 D1/CO1/CH1 A2 D1/CO1/CH1 A3 D2/CO2/CH2 A1 D2/CO2/CH2 A2 D2/CO2/CH2 A3 D2/CO2/CH2 A4 D3/CO3/CH3 A1 D3/CO3/CH3 A2 D3/CO3/CH3 A3 Weightage Tips for Instrument Development Do keep the questionnaire brief and concise. By offering positive and negative responses the respondent is required to evaluate each response rather than uniformly agreeing or disagreeing to all of the responses. There is sometimes a difficulty when responders are asked to rate items for which the general level of approval is high. Avoid responses at the scale mid-point and neutral responses. In contrast to category proliferation.  The  result  is  a  very  long  questionnaire  causing  annoyance  and  frustration  on  the  part  of  the  respondents  resulting  in  non‐ return  of  mailed  questionnaires  and  incomplete  or  inaccurate  responses  on  questionnaires  administered  directly.  Feedback  may  be  obtained  from  a  small  but  representative  as  sampling  unit. there is no comfortable resting place for those with some uncertainty. An example is: Tips for Instrument Development 1) Never 5) Often 2) Rarely 3) Occasionally 4) Fairly often 6) Very often 7) Almost always 8) Always Such stimuli run the risk of annoying or confusing the responder with hairsplitting differences between the response levels. When response categories represent a progression between a lower level of response and a higher one. 6 . Avoid category proliferation. A clear‐cut need for every question should be established." Careless responders will overlook the option they should have designated and conveniently mark the option "Other" or will be hairsplitters and will reject an option for some trivial reason. Rasch see every attempt begins with a 50:50 scenario.  Peripheral  questions and ones to find out "something that might just be nice to know" must be  avoided. Try to visualize yourself in their shoe. and many of them misinterpret directions or make mistakes in responding. There is a tendency for responders to mark every item at the same end of the scale. scale point proliferation takes some thought and effort. In contrast. In addition. Rasch recommends a rating category to be collapsed or expanded. Experience tells us  that items in a long questionnaire are normally non‐functional anymore. Avoid asking responders to rank responses. These options have the advantage of allowing the expression of some uncertainty. Avoid the response option "Others. By some statistical analysis. To help alleviate this latter problem. which seems usually to arise somewhat naturally. In most cases open-ended questions should be avoided due to variation in willingness and ability to respond in writing.

1979) • Measurement is the location of objects along a single dimension on the basis of observations which add together (Bond & Fox.g. ability or attitude) in various disciplines • Rasch model estimates of ability / attitude / difficulty become data for statistical analysis • Latent traits are usually assessed trough the responses of a sample of persons to a set of items • two response categories • more than two response categories 7 ..4/23/2012 Why Rasch! Rasch Key Concept Rasch • Intro • Key Concepts Measurement • Measurement is the process of constructing lines and locating individuals on lines (Wright & Stones. 2007) Types of Scales Nominal Interval Ordinal Ratio Rasch Measurement Model Rasch Measurement Model • Rasch model can be applied to measure latent traits (e.

8 .4/23/2012 Rasch Measurement Model • The Rasch Model belongs to the item response theory (IRT) models IRT Models • probability of an individual's response to an item • probability of a correct/keyed response to an item is a mathematical function of person parameter (ability) and item parameter (difficulty) Rasch Measurement Model • Rasch gives the maximum likelihood estimate (MLE) of an event outcome • Rasch read the pattern of an event thus predictive in nature which ability resolves the problem of missing data Rasch Measurement Model Rasch Measurement Model • ICC show pictorially the fit of the data to the model • The relationship between the probability of success to an item and the latent trait is described by a function called item characteristic curve (ICC) that takes an S-shape Rasch Measurement Model Rasch Measurement Model • The psychometric Rasch model conceptualizes the measurement scale like a ruler • Items are located along the measurement scale according to their difficulty Less difficult items - Ability + More difficult items Item characteristics curve showing the relationship between the location on the latent trait and the probability of answering the item correctly.

1946) 1 99 exp Rasch Measurement Model • Interval scales have known and equal intervals between two graduations – numbers tell us how much more of the attribute of interest is present – scales are linear and quantitative 1 99 exp 10 90 30 70 50 50 100 60 40 99 1 102 1 2 logit 10-2 -2 -1 10 90 30 70 50 50 100 60 40 99 1 102 1 2 logit 0 10-2 -2 -1 SCALE with a unit termed logit’ 0 Rasch Measurement Model Rasch Model Key Question • The probability of endorsing any response category to an item solely depends on the g y y p person ability and the item difficulty – This requirement called unidimensionality • When a person with this ability (number of test items correct) encounters an item of this difficulty (number of person who succeeded on the item). p117)  C B A Less able Rasch Measurement Model • A turn of event is seen as a chance. 1960. a likelihood of happenings hence a ratio data (Steven. what is the likelihood that this person gets this item correct? 9 .4/23/2012 Rasch Measurement Model • Person can also be located on the same measurement scale • They are located according to their ability • The less difficult items can be successfully achieved by the more able subjects More able Rasch Measurement Model • A person having a greater ability than another person should have the greater probability of solving any item of the type in question and similarly one item being more difficult than another means that for any person the probability of solving the second item is the greater one (Rasch.

1944). the prediction is expressed in term of chances / odds / probabilities. The higher up the table one goes. Matrix of Abilities vs Difficulties Ability 0/100 10/90 30/70 50/50 70/30 90/10 100/0 Difficulty 0/100 10/90 30/70 50/50 70/30 90/10 100/0 Rasch Measurement Model • This organized data table is termed a Scalogram (Guttman. the more able the person • The further right across the table one goes goes. • The data matrix (eg: result of theory driven qualitative observation) can be arranged so that the items are ordered from least to most difficult and the person are ordered from least to more able.4/23/2012 Rasch Model  Key Answer Rasch Measurement Model Theorem • Theorem 1 • The probability of success depends on the differences between the ability of the person and the difficulty of the item – Persons who are more able have a greater likelihood of correctly answer all the items ( (dichotomous response) p ) – Persons who are more developed (higher agreeability level) have a greater likelihood of endorsing all the items (polytomous response) Rasch Measurement Model Theorem • Theorem 2 – Easier items are more likely to be answered correctly by all persons (dichotomous response) p ) – Easier task are more likely to be endorsed by all persons (polytomous response) Rasch Measurement Model • In estimating the probability of success. item or persons raw score divided by total possible score) 10 . the more difficult the item • Calculate item difficulties and person abilities (n/N.

– Investigate further the item. • It is impossible to predict likelihood of student’s performance (well / poorly) on the whole test just by looking at item that is so erratic.4/23/2012 Scalogram Scalogram • Pattern of success or failure can be seen in the data matrix • Person who scores in old fashion (successful in easy question and unsuccessful for difficult question – response pattern 111111000000) is said to good to be true EASY ITEMS/TO ENDORSE             DIFFICULT  ITEMS/TO ENDORSE MORE ABLE 11111011111111111 11111111111111111 11111111111111111 11111111111111111 11110111111111111 LESS ABLE 11111111111111111 11111111111111101 11111111111111111 11111111111111111 11111111111100100 11111111111010100 10111111101001100 11110111001000100 11011011001000100 11111111111100 = 48 11111001001000 = 43 01100010000000 = 33 01100010000000 = 33 00110100000000 = 33 00110100101100 = 33 01000000000000 = 27 00000000001000 = 25 Scalogram • Person who scores well on difficult items despite low overall score – might have guess or cheat on the item Scalogram • If the person’s response pattern is unpredictable or so erratic (110101100) it is difficult to interpret the success or failure (person ability). • Person who scores poor on easy items despite high overall score – might indicate lack of concentration or guessing the item Key Rasch Measurement Concepts • Quantity • Estimates of item and person location The Basic Rasch Questions • What are the distance between the location? • Estimates of item and person locations • Precision • St d d E Standard Error (SE) of M f Measurement t • How precise are those location? • SE of measurement • Quality • Fit Statistics • Are those locations all equally valuable? • Fit Statistics 11 .

5 • The probability of responding correctly / endorsing to a question with difficulty greater than the person's location is less than 0. item locations are often scaled first • The location of an item on a scale corresponds with the person location at which there is a 0.4/23/2012 Estimate: Person / Item Location More Able More Difficult Estimate: Person / Item Location Location of a person • Rasch analysis were performed by setting the mean of person as starting point (0 logits) for the calibration Less Able Less Difficult / Easy Estimate: Person / Item Location • In applying the Rasch model.5. Separation Statistics • Item located by number of person getting a specific item correct / endorsing a specific item • Person are located by number of items they are able to answer correctly / endorse • It is necessary to locate persons and items along the variable line with sufficient precision to "see“ between them Separation Statistics • The item and person separation statistics in Rasch measurement provide an analytical tool by which to evaluate the successful development of a variable and with which to monitor its continuing utility 12 .5 probability of a correct response to the question Estimate: Person / Item Location • The probability of a person responding correctly / endorsing to a question with difficulty lower than that person's location is greater than 0 5 0.

0 to 1.4/23/2012 Separation Statistics • Person separation indicates how efficiently a set of items is able to separate those persons measured: – Separation that is too wide usually signifies gaps p y g g p among person abilities • This leads to imprecise measurement Separation Statistics • Item separation indicates how well a sample of people is able to separate those items used in the test: – Separation that is too wide usually signifies gaps among item difficulties • This leads to imprecise measurement – Separation that is too narrow signifies that not enough differentiation among person abilities to distinguish between them – Separation that is too narrow signifies redundancy for test items Separation Statistics • Separation reliabilities statistics are expressed as SE of Measurement • Standard Error – Method of measurement or estimation of the standard deviation of the sampling distribution associated with the estimation method – Refer to an estimate of that standard deviation. derived from a particular sample used to compute the estimate – range from 0.0 • Higher the value the better the separation that exists and the more precise the measurement SE of Measurement • The sample mean is the usual estimator of a population mean • However. different samples drawn from that same population would in general have different values of the sample mean SE of Measurement • The standard error of the mean – Is standard deviation of the sample mean estimate of a population mean – Estimated by the sample estimate of the population standard deviation (sample standard deviation) divided by the square root of the sample size from population 13 .

0 1 2 and 3 standard deviations above and below the actual value 14 . the true value of the standard deviation (of the error) is usually unknown • As a result the term standard error is often result. from the sample of data being analyzed at the time SE of Measurement • In other cases.4/23/2012 SE of Measurement SE of Measurement • In practical applications. 1. the proportion of samples would fall between 0. 2. used to refer to an estimate of this unknown quantity – the standard error is only an estimate • The standard error of the mean can refer to an estimate of that standard deviation computed deviation. the standard error may usefully be used to provide an indication of the size of the uncertainty • But standard error use to provide confidence intervals or tests should be avoided unless the sample size is at least moderately large – Here "large enough" would depend on the particular quantities being analyzed SE of Measurement • t‐distribution is used to provide a confidence interval for an estimated mean or difference of means Standard Deviation & Confidence  Intervals SE of Measurement • For a value that is sampled with an unbiased normally distributed error.

wide SE of Measurement • Suppose we want to be 99% confident that the "true" item difficulty is within 1 logit of its reported estimate – The sample size needed to have 99% confidence that no item calibration is more than 1 logit away from its stable value SE of Measurement • Then the estimate needs to have a standard error of 1.385 logits or less • The stability to within ± 3 logits is the best ±. wide • A two‐tailed 68% confidence interval is ±1.96 S.E.0 logits divided by 2.4/23/2012 SE of Measurement • The standard error of a measure captures its precision in a particular context • The accuracy of a measure is captured by fit statistics • A measure may be accurate. and the data fit the Rasch model SE of Measurement • This standard error is called the "model" standard error and is reported by most production‐oriented Rasch software • For well constructed tests with clean data (as well‐constructed confirmed by the fit statistics).E. the model standard error is usefully close to. but imprecise SE of Measurement • Raw scores are almost always reported without their standard errors • The highest possible precision for any measure is that obtained when every other measure is known. but slightly smaller than the actual standard error SE of Measurement • The stability of an item calibration is its modelled standard error • A two‐tailed 99% confidence interval is ±2.6 S. S E wide • A two‐tailed 95% confidence interval is ±1.00 S.E.3 that can be expected for most variables if sample size needed to have 99% confidence 15 .6 or less = 1/2.6 = 0.

and why. stable estimates • 30 respondent is enough for well‐designed pilot studies John Michael Linacre Fit Statistics Fit Statistics Item fit the model • To aid in measurement quality control • to identify those parts of the data which meet R t Rasch model specifications and h d l ifi ti d those parts which don't Item does not fit the model Fit Statistics • Parts that don't are not automatically rejected. but are examined to identify in what way. 16 . corrupt measurement • Then the decision is made to accept. reject or modify the data Fit Statistics • Modification includes simple actions such as correcting obvious data entry errors and respondent mistakes and more mistakes. they fall short. sophisticated actions such as collapsing rating scale categories. and whether. whether on balance they contribute to or balance.4/23/2012 Sample Size Item Calibrations stable within Confidence Minimum sample  size range (best to poor  targeting) 16 ‐‐ 36 27 ‐‐ 61 64 ‐‐ 144 108 ‐‐ 243 250 ‐‐ 20*test  length Size for most purposes Sample Size ± 1 logit ± 1 logit ± ½ logit ± ½ logit Definitive or High Stakes 95% 99% 95% 99% 99%+ (Items) 30 50 100 150 250 • A sample of 50 well‐targeted respondent is conservative for obtaining useful.

Gall & Borg. present information clearly Example 2 • RESEARCH: Student’s learning for the subject of Introduction to Interactive Multimedia • What need to be measured? • Level of Students Learning Ability – What are the constructs and their variables? • Construct 2: Student-Instructor Interaction – Variables: encourage students to actively involve. knowledge of subject. • Measurement here is to measure the latent  traits  of people Developing  Instruments • Steps • Hands‐on Developing a Questionnaire Indentify research  that studies same  construct Define the  construct Define the target  population Example 1 • RESEARCH: Student’s Satisfaction in Distance Learning Course • What need to be measured? • St d t S ti f ti Student Satisfaction – What are the constructs and their variables? Evaluate draft Develop a draft Review related  measures Revise the test Collect data on  reliability and  validity DIMENSION=CONSTRUCT=CHAPTER ITEM=VARIABLES=SUBCHAPTER (Gall. fair treatment. provide progress periodically • Construct 3: Course Evaluation – Variables: course material relevant. respects students. 2003) Student’s Satisfaction in DL Course • Construct 1: Instructor Performance – Variables: availability. encourage questions.4/23/2012 Developing Instrument Developing Instruments • To find out about the characteristics (latent  traits) of people • Latent Traits: A characteristic or attribute of a person that can be inferred from the observation of the person’s behaviours. workload appropriate with hours of credit 17 . provide feedback on work. assignment relevant.

Synthesis • Dichotomous  • Polytomous Dichotomous Data • Rasch Principles: – Person Ability (N correct) – Item Facility (difficulty – N correct) – Proportion: n correct/N possible – Odds of success/failure • Natural logarithm of odds – logits (log odds units) Person R10 R03 R05 R12 R09 R06 R11 R01 R07 R04 R02 R08 Facility n/1‐n Q11 Q1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 11 10 93/ 83/ 7 17 Dichotomous Data Data Matrix Showing the Odds Q8 0 1 1 1 1 1 1 1 1 1 0 1 10 83/ 17 Q2 1 1 1 1 1 1 0 1 0 0 0 0 7 58/ 42 Item Q7 Q10 Q3 1 1 1 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 7 6 5 58/ 50/ 42/ 42 50 58 Q5 1 0 1 1 0 0 1 0 1 0 0 0 5 42/ 58 Q9 0 1 1 1 0 0 0 0 0 0 0 0 3 25/ 75 Q4 1 1 0 0 0 0 0 0 0 0 0 0 2 17/ 83 Q6 1 0 0 0 0 0 0 0 0 0 0 0 1 8/ 92 Ability 9 8 7 7 6 6 5 5 5 4 3 2 n/1‐n 82/18 73/27 64/36 64/36 55/45 / 55/45 45/55 45/55 45/55 36/64 27/73 18/82 Procedure 1: Estimate Location • Iterative process between item and person values: 1. Analysis. How difficult are these item? 2. 2 How able are these person? Procedure II: Fit Statistics  • Item difficulties and person abilities are entered into a matrix • The Rasch modeled table of expected probabilities for each cell is calculated • Iterated until acceptable variation in location is reached 18 .4/23/2012 Students Learning Dicho vs Poly Rating Scale Model • Construct: Learning Outcome – Variables: Knowledge. Evaluation. Knowledge Understanding. Understanding Application.

Calculate the Rasch Expected Response Probabilities (Eni) Based on Item and Person Estimates (eg: 0. Calculate Response Residual Yni = Xni‐Eni – Y = the response residual that remains in the cell for person n x item i when the expected response p probability Xni is subtracted from the actual y responses Eni Procedure II: Fit Statistics  • Residuals are squared (to remove negative values) and summed to yield: – Mean Squares of residual for every item and person • • Low mean squares are too predictable to believe High mean squares are too unpredictable to yield measures • Residuals often standardized (t or z statistic) 19 .4/23/2012 Procedure II: Fit Statistics  • Measurement is possible with only one variable at a time • Construct validity is the key concept.80) – E=Expected response probability when any person with the ability n respond to an item with difficulty i Procedure II: Fit Statistics  3. Collect Observed Scores (Xni) – Must always be whole numbers (0. – Th Theoretical argument th t th it ti l t that the items i an in instrument measures what it claims to measure Procedure II: Fit Statistics  • Fit Statistics (misfit statistics) help in control of the measurement construct • Fit Statistics are based on residuals – Diff Difference b t between actual and expected outcome t l d t d t • Fit Statistic are used to control the quality of  measures Procedure II: Fit Statistics  • Aim is to detect the differences between: – EXPECTED: The strict measurement of requirements of the Rasch Model (Theory) – ACTUAL: The data collected when the real items are used with real people (Practice) Procedure II: Fit Statistics  1.1) 2.

Fit? Polytomous Data • Nature of likert scales: – – – – Ordered categories – lowest to highest Response opportunities – good – neutral – bad Odd / even number categories With / out mid point category (neutral) • Our focus is on: ̵ ̵ ̵ ̵ ̵ Construct validity with a clear direction Estimate of ability/development Precision of the estimate Confidence of estimate Probability of success on similar item Polytomous Data • For rating‐scale data: – Each item have a difficulty estimate – Scale has a series of thresholds • Item thresholds estimates the location where a person with the estimated ability has 50% probability success/ failure on an item at same location 5 category have 4 thresholds Polytomous Data • 20 .8 – Infit / Outfit Mean Square:   0.4/23/2012 Procedure II: Fit Statistics  • In order to verify for fit and misfit items or persons. the following criteria must be satisfied: – Point Measure Correlation: 0 32 < x < 0 8 Point Measure Correlation: 0.4 (for survey) – Infit / Outfit Z standard:   ‐2.5 < y < 1.6 <  y < 1.5 OR 0. Precision? 3.32 < x < 0.0 < Z <+2. Quantity? 2.0 Dichotomous Data • Example of the cognitive developmental test (BLOT) used to outline Rasch measurement: Tutorial 4 – Analysis of BLOT focus on performance of the item than persons Dichotomous Data • The basic rasch questions: 1.

Logit score . OFFLINE OR ONLINE ::Student answers via question form.7315 3.rasch.4/23/2012 Polytomous Data • Run Tutorial 6 and interpret output – – – – – – Quantity? Precision? Fit? Maps / Tables Variable Map Fit Map Item Characteristic Curve (ICC) Person / Item Tables For More.0 2.0 GPA DISTRIBUTE QUESTION.3 = 1.org TERIMA KASIH SYUKRAN JAZILAN THANK YOU SYUKRIYA ARIGATO GOZAIMAS OOKINI XIE XIE KHOP KUN MERCI DANKE SCHON DOR JE • R hU Reach Us: SAS EFHARISTO NANDI • Mohd Nor @ 019 281 9003 DANK U TODA • Prasanna @  HVALA DAGHANG SALAMAT • Nurul Hidayah @ GRAZIE MATUR NUVUN KOMAPSUMNIDA MAHALO STA NA SHUKRIA Educational Use • To accurately measure student’s ability (knowledge. final marks (%) for student’s EA Probability Sc % 73% 73% equivalent to B+ OR 3. Using above formula. offline or online:: (responses of min 15 students are recorded for at least of 95% confidency level) TEMPLATE 2 COPY PERSON AND ITEM MEASURE INTO TEMPLATE 2 (EA Score) 21 . skill and attitude) • To verify the reliability of test question set • To verify the reliability of student’s answer if h li bili f d ’ • To confirm the cognitive levels of question according to educational taxonomy • To quantify student’s ability in percentile SHARING TIME… How to Apply Simple Rasch in  Educational Assessment Item Measure increasing Person Measure decreasing exp(*) / (1+exp(*)) (1+exp(*)) 1. Please Visit: http://www.e EA measure score is 1.Item difficulty score (*) i.3 – Item difficulty score is 0. Therefore. the answer will be 0.