Professional Documents
Culture Documents
Meauring Sensorial
Meauring Sensorial
Meauring Sensorial
RASCH MODEL
C. GARCIA J. VENTANAS’, T. ANTEQUERA’, J. RUIZ’, R. CAVA’
and P. ALVAREZ’
ABSTRACT
INTRODUCTION
Latent trait models, in test theory, focus on the interaction between a person and
an item rather than upon test scores. The mathematical formula models a response
to an item. The most representative model for Item Response Theory is the Rasch
model. It is an instrument for measuring latent traits (Andrich 1988). This method
has been used in other fields as the study of the monofloral honeys of Extremadura
(Lozano 1993) the sensorial characteristics of wine (Alvhez 1992) and in
comparison with Principal Component Analysis (Horimoto et al. 1995).
METHOD
a)
8,
I I I I
-.-
low quality 6, 62 63 64 high quality
b)
PI 6, 62 63 64
X X- X X- a ham B, would not expected to account
.-
for any quality criteria (item)
4 82 83 84 84
-X- X X
- X ham 4, would expected to account for
all quality criteria.
Let Xnibe the quality dichotomous variable which means that the ham “n” will
score on item ‘‘i’’. If the score is 1, that is X,,=l, then it is said that ham “n” has
some quality; other wise Xni=0.
Therefore: if B, -bi > 0 then P[X,,= 11 B0.5
if B, - b 1 < 0 then P[Xni= 11 <0.5
if 13, -bi = 0 then P[X,,= 1]= 0.5
This analysis allows us to relate the probability of having quality to the difference
(0, -4). This difference can range from --to +-; and probability range from 0 to
1, that is 0 I P{X,, = 1) i 1
0se ’ m - b l < +m
With a further adjustment the following expression can be obtained and its
limit is
This is the formula that George Rasch (Rasch 1980) chose in his development of
latent trait theory.
The probability when Xni=0 is
It is clear from the formula that it is unimportant that the value for B, was 6 and for
hiwas 4. The important thing is that they were two units apart. Any pair of values
differing by 2 will produce the same probability. Our scale is interval not ratio,
where the numbers chosen on the scale are arbitrary so long as the difference is 2.
Suppose ham “n” score on a set L of items X,,, Xn2,Xn3,....,XnLwhere each of the
elements will be 1 or 0. One way for accounting for “quality” will be the number
of items that ham “n” scores on items, ignoring the score pattern. We are aware
that not all items are scored in the same way for each ham; then the pattern of the
score is important. The total score which a ham obtains can be represented by r,
= X,, + Xn2+ .... + XnL,The conditional probability that if a certain total score is
MEASURING QUALITY OF HAM BY RASCH MODEL 40 1
- “*fin]IPn,’J
P(rnI Pn,’-)
*
= ; where - (“Till”) indicates all values are referred to.
Thus, Xniindicates the score of ham “n” to all items referred to as “i“, goes from
1 to L. Thus, Xnirefers to the pattern of scores from ham “n”.
The numerator is the probability of scoring with a particular pattern and
obtaining the total score which that pattern yields. The denominator is the
probability of obtaining that total score by any pattern. It can be shown that
i
*=
The sum is taken over all values of “i”, that is, over “all items, given that the total
score for the pattern of scores involved is rn”. That restriction on the summation
is represented by v, I r,.
Then the probability of obtaining a particular score by one pattern rather than
another depends on the paramaters of the item; the pattern of scores provides no
information about the hams. The information is provided by the total score. Rasch
model is the only latent trait model which justifies the use of total score.
Suppose that 8 , = 1 and h2= 4, then the probability of the ham “n” scoring 6,,
and not B2 is
In a practical situation we do not know the locations of the items’ parameters ,but
we can observe the pattern of the scores of the hams on the items.
For example, suppose we have 100 hams assessed by two items in a
402 C. GARCIA, J. VENTANAS, T. ANTEQUERA, J . RUIZ, R. CAVA and P. ALVAREZ
dichotomous way. If 18 hams score on both items and 9 hams did not score, then
there is a total of 27 hams that provide no information, the situation tells us nothing
about the items. If from the remaining 73 which score on only one of two items we
found that 69 (95%) were scoring on one item (item i = I) and 4 (5 %) were scoring
on the other (item i =2) as Eq. 6 and 7 show, we would scale the items 3 logit apart.
We could say that 8 , , was 1 and was 4, or 8 , = 0 and 82 = 3, or b l = - 1.5 and ti2
= 1.5. Our scale is interval not ratio, so where the numbers are chosen on the scale
is arbitrary so long as the difference is 3.
We used r, as the total score (number of items scored by the ham “n”)
defined by
If we consider scores to an item, rather than the score to a ham, we could count the
number of hams scoring that item over all N hams. Then
N
S,=Cxni
i-1
(9)
We could show that, if a certain number of hams yield a score of an item, which
ones they are will not depend on the item. So, not only does the Rasch model
justify the use of a total score for hams but also for items. This does not mean that
the number m and Sj should be used as a measure, but it does mean that they
contain all the information which is needed to estimate parameters r, and Si. They
are sufficient statistics for estimation.
N L N L
n i n i
This equation expresses the log likehood of the observed pattern scores in terms of
the parameters p, ,li,Si and r,. The patterns of the individual scores of ham on
items do not appear, only the total. This, together with the separation of r, p, and
Si ii,establishes the sufficiency of r, for estimating p,, and of Si for estimating ii
and provides the mean of obtaining estimates of the p, which are independent of
the 4 and vice versa. This equation would allow us to calculate the probability of
occurrence of a complete matrix of scores if we know the parameters p, and &.
The best estimates of parameters p, and 4 are found by maximizing the
likehood function given by the Eq. 1 I . An initial set of estimates is taken and the
log likehood of occurrence is calculated using Eq. 1 I. The estimates are then
altered in a direction which will increase the likehood occurrence of the observed
data. This process is continued until the estimates of parameters (p, and hi) which
best account for the actual score pattern we have obtained.
Using calculus it can be shown that the likehood function is maximized when
for each ham “n”
i-1
N
si=cP(X,=l)
n-1
Equation 12 actually represents N equations, since there is one for each ham.
Equation 13 actually represents L equations, since there is one for each of the L
items. There are N hams involved but they cannot all obtain different total scores,
unless there are more items than hams, if any ham does not score any items and
thus score r, = 0, p, cannot be estimated since they could be anywhere along the
quality line below the items. Similarly the hams which score all items r, = L cannot
be estimated. Furthermore all hams obtaining the same score in the range r, = 1 to
404 C. GARCIA, J. VENTANAS, T. ANTEQUERA, J. RUIZ, R. CAVA and P. ALVAREZ
(L-1)will be estimate to have the same parameter pr. We don't need to write p,
since it applies to all hams with the score r. Instead of one equation for every ham
only one equation for each of the (L-1) acceptable scores on L items is needed, that
is for scores from I to (L-I)the rest of Eq. 12 becomes
L
r = C P(X,=l)
i-I
L-1
si=cn p ( x r i = l )
r-1
In the computer programs which derive estimates for the parameters starting values
for the (L-1)estimates associated with each acceptable score are taken as
p, =log)-- r 1
L-r
and the starting values for the L item estimates are taken as
MEASURING QUALITY OF HAM B Y RASCH MODEL 405
The term subtracted serves only to fix the mean of the starting values for biat zero.
The scale is only interval so the origin is arbitrary. Fixing the mean at zero simply
fixes the scale on which the relative positions of both the item and the ham are
located. Successively better estimates for the pr and b iare obtained until successive
estimates on the right hand sides of Eq. 14 and Eq. 15 move closer to the observed
totals by less than a very small amount. These are then the best estimates of the pr
and tii in the sense that with no other values would the actual scores obtained be
more likely to have occurred. The procedure described is a method for
unconditional estimation (Wright and Masters 1982).
The standard errors of the estimates of the items parameters are given by
The standard errors of the estimates of the hams parameters are given by
1
W P )=
L
These 26 items will correspond to the bi (i= 1,2,3,..26)parameter for quality. The
156 assessment from 8 hams (5 fed from acorn and 3 from different kind of feeds)
tasted by 15 ham judges will correspond to the Bn (n= 1,2,3,..,156) parameter.
Scores of hams on items are a computation of the level of the '5'' quality criterion
of the "n" ham assessment.
The amount computed for each quality criterion level for each ham, assessed
by all judges, is expressed on a 1 to 10 scale (Alvarez et al. 1993).
Parameters pn and bi are estimated by the maximum likehood method for 10
categories using the PROX (Wright and Douglas 1977) and UCON (Wright and
Mead 1976) algorithms.
We have constructed a quality variable and located items and hams along it
from our observations. The map of the variable is a picture of the extent to which
we have accomplished the task of variable construction.
Hams separation indicates how efficiently a set of items is able to separate those
hams measured. Item separation indicates how well a sample of hams is able to
separate those items used.
It is desirable to locate hams assessment and items (quality criteria) along the
variable line with sufficient precision to be able to see between them. The more
items and hams are separated along this line the more usable are their
measurements.
The distance among items also identifies the direction and meaning of the
variable (Table 1). The item locations are the operational definition of the variable
of interest while the hams assessment locations are the application of the variable
to measurement (Table 2).
The least raw score items are those that imply higher measurements (Fig. Ic),
they turn out to be items 25; 24; 23; 22; 17. Items with higher raw scores imply
lower measurements and they are items 9; 12; 1 1 . The item 9 calibration is smaller
than the item 25 (Table I). The same can be said for the hams assessment. The
highest quality ham is the assessment no. 41, which is identified by "judge8Ham-
Acorn4" which means the ham no.4 from acornfeed pigs it is tasted by judge no.
8. The lowest quality ham is the one on the assessment 147 (Table 2).
TABLE I .
COMPUTATIONS OF MEASURE FOR ITEMS
MEAN
S.D.
557. 156.
243. 0.
50.0
3.0
.2 1.05
.Ol .31
.3 1.04
2.71 .30 .?:I I
MFIT is a standardized information-weightedmean square statistic, which is more sensitive to
unexpected responses to items near the ham location quality level.
MNSQ is the mean-square infit statistic, with expectation 1. Values substantially less than 1
indicate dependency in the data; values substantially greater than 1 indicate noise.
OUTFIT is a standardized outlier-sensitive mean square fit statistic, more sensitive to unexpected
ham score on items far from the hams location level.
MNSQ is the mean-square outfit statistic, with expectation 1. Values substantially less than 1
indicate dependency in the data; values substantially greater than 1 indicate the presence of
unexpected outliers.
PTBIS is the point-biserial correlation between the individual item score and the test person score
for the scored observations used in the analysis. Negative values for items often indicate missing
scores.
An item fit statistic is calculated for each item. This summarizes the extent to
which the pattern of the data of the sample on that item is consistent with the way
these hams have data on the other items. This gives a consistency fit statistic for
each item and for each ham assessment, and also for any subsets of items and hams
which might interest us.
408 C. GARCIA, J. VENTANAS, T. ANTEQUERA, J. RUIZ, R. CAVA and P. ALVAREZ
TABLE 2.
COMPUTATION OF MEASURE FOR HAMS AND JUDGES
I%
I
152
71
149
59
::
51
SO
26
26
26
26
26
45.1
44.8
44.6
44.5
44.4
.6 .60 -1.4 .55 -1.4
.6 1.86
.6 1.16
3.1 1.47
.6 .97 -.l 1.00
.6 .93
.6 .94 -.2 .89
1 .4
.o
.2
.3
-
~
.77 judgeVHamFeed-5
.66 judgel2HamFeed6
.60 judge6HamFeed6
.73 judge4HamFeed6
.61 judge6HamFeedh
153 49 26 44.3 .6 1.00 .o 1.01 .o .64 judgelOHamFeedb
59 47 26 44.1 .6 .49 -1.8 .45 -1.6 .80 judge4HamFeed6
66 45 26 44.0 .6 1.18 .6 .95 -.l .65 judgel2HamFeed6
150 43 26 43.8 .6 .70 -1.0 .71 - .8 .66 judge7HamFeed6
I 148
147
42
34
26
26
43.7
42.9
.6 1.34 1.2 1.00
.7 .26 -2.4 .25 -1.9
.o .67 judge4HamFeedb
.82 judge3HamFeed6
MEAN 93. 26. 47.6 .6 1.01 . O 1.04 .2
S.D. 22. 0. 1.7 .O .43 1.6 .50 1.7
MEASURING QUALITY OF HAM BY RASCH MODEL 409
TABLE 3.
POORLY FITTING HAMS (ITEMS IN ENTRY ORDER)
This t a b l e shows the i t e m assessed by judge 11 f o r the hem 2 f o r which the Standardized o u t f i t (or
i n f i t , i f OUTFIT=N) s t a t i s t i c i s greater than the m i s f i t c r i t e r i o n (FtTPS or FlTl=). The assessmnt
codes a r e l i s t e d i n t h e i r sequence order i n the data f i l e . The residuals are standardized assessment
score residuals, uhich h i v e a rnodelled expectation of 0, and a variance of 1. Negative residuals
i n d i c a t e t h a t the Level of the observed assessmnt uas less than eapected. P o s i t i v e residuals i n d i c a t e
that t h e l e v e l o f the observed assessmnt uas m r e than expected.
i.e. The item e n t r y no. 11" Uhich correspond t o Wryness o f Lean" uas asses i n the l e v e l 2, the
corresponding residual i s - 2 uhich mans the h m acorn no.2 tested by the judge no. 11 was assessed
Less than expccted.
The item e n t r y no. 17ruhich correspond t o V i i t t e r taste" uas asses i n the l e v e l 8 , the corresponding
residual i s 5 uhich mans the ham acorn no.2 tested by the judge no. 11 uas assessed m r e than
expected
For the assessment 110, which correspond to a ham fed by acorn and assessed
by the judge 1 1, item 17 misfits the model. It has a high standardized residual of
5 when applied to the ham, (order of entry number 17 and means that it has a score
significantly higher than expected, that is, given all items and hams, the item 17 for
that ham and judge are significantlytoo large. The same can be said for the item
25, with a standardized residual of 4 (Table 3).
Ham observation validity is determined by an analysis of the validity of the
observations of that ham. This identifies review items which may not have an
observed result in the way expected (Table 4). Item 26 misfit the model, although
there are not high residuals, but there are many positives and negatives, which
means that judges have no unanimous criteria about this cellar flavor.
CONCLUSIONS
good method to determinate with objectivity the ham quality, and shows (Fig. 2 )
the higher quality of ham from acornfed pig.
0
0
0
0 0
0
A
a A
LOW QUALITY
FIG. 2. THIS MAP SHOWS ACORN HAMS FROM TABLE I1 HAVE BETTER QUALITY
THAN FEED HAMS.
These data fit the model well. Indices for suitability are good, although there
are some hams and items that misfit the model for quality according to the criteria
of 15 experienced ham tasters; it is necessary to look at the data closely in those
items and hams with high residuals in order to find an explanation for these
anomalies.
MEASURPJG QUALITY OF HAM BY RASCH MODEL 41 1
TABLE 4.
POORLY FITTING FASES (HAMS IN ENTRY ORDER)
76 I 1
ASSESSMENT: 0 5 0 1 5 0 7 4 0 2 4 2 6 0 0 2 2 2 0 6 0 0 4 1 0
RESIDUAL: -1 2-1 2-1 2 1-1 1 2-1-1 -1 1-1 1 -1
126
ASSESSUENT: 4 0 6 8 0 0 0 3 0 0 0 1 2 3 1 8 0 1 2 6 0 0 0 0 0
RESIDUAL: 1-1 2 2-1-1 -1 -1 2 2
This t a b l e shows the assessments of item 26, from every judges and every hams, for uhich the
Standardized o u t f i t ( o r i n f i t , i f W T F I T = N ) s t a r i s t i c i s greater than the m i s f i t c r i t e r i o n (FITP- or
FITII). The assessment codes a r e l i s t e d i n t h e i r sequence order i n the data f i l e . The residuals are
standardized assessment score residuals, uhich have a modelled expectation of 0, and a variance of
1. NegafIve residuals i n d i c a t e that the observed assessment n05 less correct than expected.
Fit analysis is a good way to find out how the items and hams work, and to
identify items and hams with those causes that bring about high residuals through
unexpected scores.
This methodology can be applied to any kind of ham and different sets of items
and different judges.
REFERENCES