You are on page 1of 19

Computers & Industrial Engineering 51 (2006) 652–670

www.elsevier.com/locate/dsw

Shelf life determination using sensory evaluation scores:


A general Weibull modeling approach q
a,* b
Marta A. Freitas , Josenete C. Costa
a
Departamento de Engenharia de Produção, E.E., Universidade Federal de Minas Gerais, R. Espı´rito Santo 35,
CEP 30160-030, Belo Horizonte, MG, Brazil
b
Departamento de Estatı́stica, ICEX, Universidade Federal de Minas Gerais, CEP 31270-901, Belo Horizonte, MG, Brazil

Received 28 September 2005; received in revised form 15 May 2006; accepted 18 May 2006
Available online 1 September 2006

Abstract

Sensory evaluations to determine the shelf life of food products are routinely conducted in food experimentation as a
part of each product development program. In such experiments, trained panelists are asked to judge food attributes by
reference to a scale. The ‘‘failure time’’ associated to a product unit under test is usually defined as the time required to
reach a cut-off point previously defined by the food company. Due to the destructive nature of these evaluations, one never
knows the exact ‘‘failure time’’ of a given unit. Consequently, data arising from these studies are either left or right cen-
sored. This article deals with the problem of modeling such kind of data for shelf life determination and develops a general
Weibull model (GWM). Simulations indicate a good performance of the parameter estimates obtained though the GWM.
The modeling approach is applied to a real data set.
 2006 Elsevier Ltd. All rights reserved.

Keywords: Maximum likelihood; Sensory evaluations; Shelf life; Weibull distribution

1. Introduction

Some food products require extended aging under controlled conditions of storage to develop characteristic
flavors. The unique, piquant taste of blue cheese is attributable to mold growth and enzymatic activity pro-
moting fermentation of protein and carbohydrate, hydrolysis of fat, and secondary chemical reactions. The
taste and odor of freshly distilled spirits, particularly whiskey, is rather raw and unpleasant; desired flavor
components develop during years of aging in wood. Unfortunately, not all food and beverage products
increase in sensory quality with storage or holding time.
Problems related to storage stability are common to the food industry and that is why storage studies are
an essential part of every product development, improvement, or maintenance program. Some studies center

q
This manuscript was processed by Area Editor Gray L. Hogg.
*
Corresponding author.
E-mail addresses: marta.freitas@pesquisador.cnpq.br (M.A. Freitas), josenete@est.ufmg.br (J.C. Costa).

0360-8352/$ - see front matter  2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cie.2006.04.005
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 653

on the rate of degradation, and others on estimating the shelf life: the length of time required for the prod-
uct to be unfit for human consumption. By unfit for human consumption it is meant that the product exhibits
either physical, chemical, microbiological, or sensory characteristics that are unaccepted for regular
consumption.
The manufacturer attempts to develop a product with the longest shelf life practical, consistent with costs
and the pattern of handling and use by distributors, retailers, and consumers. Inadequate shelf life determi-
nation will lead to consumer dissatisfaction or complains. At best, such dissatisfaction will eventually affect
the acceptance and sales of brand name products. At worst, it can lead to malnutrition an even illness. For
these reasons, food processors pay great attention to adequate storage stability or shelf life.
When one talks about determining the shelf life, chemical, physical, microbiological and nutritional anal-
ysis are fundamental but equally important are the sensory characteristics of the product. It is a fact that many
consumers purchase a product on the basis of the sensory experience which it delivers for instance, sweetness,
softness, chocolateness, aspect, odor, flavor, aftertaste, etc. For this reason, the role of sensory evaluations for
shelf life determination is becoming more important as the value of these tests becomes recognized and as the
consumer industry increases its use of these methods.
In such experiments, a sample of product units is stored under certain conditions and periodically, at pre-
specified evaluation times a sample of units is collected from the ones stored in a given condition and subjected
to sensory evaluations by trained panelists. A frequent test procedure is to ask each panelist to judge each
attribute separately by reference to a rating scale, for instance, a seven point rating scale varying from 0
(‘‘no difference’’) to 6 (‘‘total difference’’). Each ‘‘test unit’’ is compared to a ‘‘control unit’’ and the scores
are the result of this comparison. Because of the destructive nature of evaluations, units that have already been
evaluated at a given time cannot be restored to be evaluated later on.
The usual approach to the modeling and analysis of this kind of data, is to fit a regression line (obtained by
the method of least squares) relating the scores (z) and the pre-specified evaluation times (x). An estimate of
the shelf life is then obtained by solving the fitted regression equation for x and replacing the score (z) by the
cut-off point c (which indicates ‘‘product failure’’) previously chosen by the food company. Gacula and Singh
(1984) presented examples in which regression models were implemented to estimate the shelf life. However,
this approach offers difficulties since in general, the assumption of Normality and homocedasticity (constant
variance)-basic requirements of Regression Analysis (Draper & Smith, 1998) are not valid for sensory scores.
To overcome this difficulty, Gacula and Singh (1984) suggested some transformations of the experimental data
to new scales where the Normality assumption may be approximately satisfied. Alternatives to overcome the
violation of the homocedasticity assumption include the use of variance-stabilizing transformations and
weighted least squares. However, these procedures, extensively used in other practical problems seems to be
of no use for this kind of data (see Gacula & Singh, 1984).
An alternative approach is to fit a parametric lifetime model such as Weibull, lognormal (just to name a
few), to the ‘‘time to failure’’ data, in other words, to the time at which a specified deterioration level has been
achieved. But it should be pointed out that in these experiments, one does not observe the actual ‘‘time to fail-
ure’’ of a given unit. For a unit evaluated at a pre-established time, one of the two situations might happen: the
score is either less than or equal to the cut-off point c (indicating that the ‘‘failure’’ has already been occurred)
or greater than c. In the former case the ‘‘failure tim’’ is somewhere between the start of the experiment and
the present time of evaluation. In the latter case, the product is still appropriate for consumption and its ‘‘fail-
ure’’ will take place sometime after evaluation but it will not be observed due to the destructive nature of the
experimental trials. Thus, data coming from these sensory evaluations are either left or right censored, respec-
tively (Meeker & Escobar, 1998). Gacula and Kubala (1975) presented examples of data analysis for shelf life
determination. The authors used the evaluation times as approximations to the real failure time (eliminating
the left censoring). Shelf life was then estimated using the Weibull distribution fitted to the approximated fail-
ure times and the right censored observations. Discussing specific designs for shelf life determination Gacula
(1975) suggested the use of a ‘‘staggered design’’ to implement weekly sensory evaluations. In this design the
intervals between evaluations are larger at the beginning of the experiment and smaller towards the end. The
number of units sampled increases towards the end of the experiment. The idea is to get closer to the ‘‘real’’
failure time. In the examples discussed in that paper, the author also uses the evaluation times as ‘‘failure
times’’ when the score is less than the cut-off point.
654 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

Two problems associated with the above mentioned approach should be pointed out: (1) the failure time
model (in those specific cases, Weibull) is not used appropriately since the evaluation times are fixed and
(2) as a consequence of the approximation (i.e., the use of the evaluation times as if they were the actual failure
times) there is a great risk of overestimating the shelf life.
In an attempt to incorporate both the information of the left and right censored data, Freitas, Borges, and Ho
(2003) proposed an alternative statistical model. The basic idea was a dichotomization of the original score data,
by defining at each evaluation time si a Bernoulli random variable Yij with parameter pij (probability of a score
6c). Assuming a Weibull (scale; shape) = (aj; d) as the underlying distribution for the shelf life time, the proba-
bility pij was then defined as pij = P(0 < Tij 6 si), where Tij is the ‘‘time to failure’’ of the jth unit evaluated at time
si. The authors allowed the scale parameter of the Weibull distribution to be linearly dependent on explanatory
variables (covariates or factors of an experimental design). The parameters were estimated by the maximum like-
lihood method. The model was applied to the data coming from a real situation where product units were stored
in three different conditions and percentiles of the shelf life distribution and fraction of ‘‘defectives’’ were estimat-
ed for each one of them. A small simulation study was implemented considering only the basic sample plan used in
the real experiment. Later, Freitas, Borges, and Ho (2004) expanded the simulation study with focus on the effect
of several aspects such as, the total sample size, the proportion of allocation to each experimental condition,
among others in the bias and precision of the estimates obtained with the model.
In this paper, we generalize the modeling approach proposed by Freitas et al. (2003). The idea is to continue
working with the basic dichotomized data as it was described above but, to use as the underlying distribution a
Weibull where both parameters, scale (aj) and shape (dj) are dependent on explanatory variables.
The article is organized as follows. In Section 2, we describe the real motivating situation. In Section 3, we
briefly review the modeling approach as in Freitas et al. (2003) and introduce the general Weibull model
(GWM). In Section 4, we present our simulation results on the bias and precision of the estimators. In Section
5, we apply the GWM to the same the real data used in the work by Freitas et al. (2003) and compare the
results to the ones reported by them. We provide concluding remarks in Section 6 and some derivations
and technical details of expressions used in Section 3 in Appendix A.

2. Motivating situation revisited

We briefly present here the situation described by Freitas et al. (2003).


Sensory evaluations were conducted by a food company at the laboratory level in order to determine the
shelf life of a manufactured dehydrated product, stored at different environmental conditions. Three attributes
were evaluated by trained panelists: odor, flavor, appearance. The main characteristics of this study are pre-
sented next.

2.1. Experimental design

A lot of product units was sampled from the production line and units were randomly assigned to one of
the following storage conditions:

• Refrigeration: Units were kept under refrigeration at 4 C (approximately). Temperature and humidity lev-
els were not controlled but they were recorded daily and average weekly values were computed and used in
this study. These units were used as reference (control) during the trials;
• Room temperature and humidity: Temperature and humidity levels were monitored and registered contin-
uously by an equipment and average weekly values were calculated and used in this study;
• Environmental Chamber 1: Temperature and humidity levels controlled at 30 C and 80%, respectively;
• Environmental Chamber 2: Temperature controlled at 37 C. Humidity levels were not controlled but they
were collected daily and average weekly values were computed and used;

The last two conditions were used in order to simulate an aggressive storage environment. Researchers
expected to register a shorter shelf life under those conditions when compared with storage under room
temperature.
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 655

2.2. Laboratory panel

Forty-five (45) subjects were trained for the sensory characteristics of the product before the main trial
started. Sensory evaluations were made initially and then every week thereafter. Each week, eight trained sub-
jects were randomly selected to form the test panel.

2.3. Test procedure

Evaluations were performed weekly. At a given week, units were sampled from each storage condition
group. Each panelist was offered in random order three sets of units to be evaluated, namely [RE, BRE,
R]; [RE, BRE, CH1] and [RE, BRE, CH2], where RE, BRE and R stand respectively for ‘‘reference’’,
‘‘blind reference’’ and ‘‘room temperature and humidity’’; CHi stands for ‘‘Chamber i’’ i = 1, 2. Within
a given group, the reference unit (RE) was always evaluated first. For the other two (BRE, R, CH1 or
CH2), the order was randomized. All units were discarded after evaluation. The unit labeled as ‘‘refer-
ence’’ (RE) and the ‘‘blind reference’’ (BRE) were both sampled from the ‘‘refrigerated condition’’ group.
Those experimental units were used only to check the consistency of panelists’ judgements. By consistency
we mean that it is expected that a panelist would give similar scores to the blind reference (BRE) and the
reference (RE) experimental units. If panelist’s scores for these two reference units disagree greatly then
the panelist’s scores should not be included in the study. Fortunately, no inconsistencies were found in
this data set.

2.4. Measurement scale

Panelists were asked to compare each test unit (including the ‘‘blind’’ reference) with the reference and
assign a score on a seven-point scale (0–6) individually to each attribute: 6 = ‘‘no difference’’; 5 = ‘‘very slight
difference’’; 4 = ‘‘slight difference’’; 3 = ‘‘different’’; 2 = ‘‘large difference’’ 1 = ‘‘very large difference’’;
0 = ‘‘total difference’’.

2.5. Criterion of failure

The manufacturer adopted the following failure criterion: for each attribute, product units scored 0, 1, 2 or
3 were considered unfit for human consumption.

2.6. Follow up time

Units stored at room temperature; Chambers 1 and 2 were followed for 51, 36 and 18 weeks, respectively.
Attributes were scored separately therefore it is possible to have a product unit classified as unfit regarding
one particular attribute and fit regarding another one. According to the failure criterion adopted by the man-
ufacturer, at a given evaluation week one of the following situations (for each attribute) might happen: if the
attribute’s score is less than or equal 3 (three) then one knows that the particular unit has become unfit for
human consumption in a moment somewhere between the beginning of the trial and the evaluation week.
On the other hand, if the attribute’s score is greater than 3 then, that unit is still fit for consumption (regarding
that attribute). Unfortunately, because of the trial’s destructive characteristic, the follow up of that unit is
interrupted.

3. The general Weibull model

We first give a brief presentation the model by Freitas et al. (2003) since it is the basis of the formulation
suggested in this article. Pk
Suppose a sample of N ¼ i¼1 ni food product units is taken from the production line and stored under a
given environmental condition. These units will be evaluated by trained panelists at pre-established evaluation
times in order to determine their shelf life.
656 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

Let si (i = 1, 2, . . . , k) be the evaluation times (fixed). Then, at the evaluation time s1, n1 units are sampled
from the total N and subjected to a sensory evaluation by n1 panelists who score each attribute separately
(odor, flavor, and appearance) using a seven-point rating scale (for instance, 0 to 6).
Note that those n1 units can no longer be followed in time due to the destructive nature of the evaluations.
Next at s2, n2 units are sampled from the N  n1 units left and evaluated by n2 panelists. This process is repeat-
ed through the last evaluation time sk when the remaining nk units are finally evaluated.
Let Zij be the score assigned to a particular attribute of the jth product unit (j = 1, 2, . . . , ni) evaluated at
the time si (i = 1, 2, . . . , k) and c the cut-off point in the scale, chosen by the food company.
The authors defined a new random variable Yij through a dichotomization of the results, where Yij takes the
value ‘‘1’’ if Zij 6 c and takes the value ‘‘zero’’ otherwise.
Therefore at each fixed time si what is observed is a random sample of size ni from a random variable Yij,
Bernoulli distributed with probability pij (probability of being ‘‘inadequate’’) given by:

pij ¼ P ðY ij ¼ 1Þ ¼ P ðZ ij 6 cÞ ¼ P ð0 < T ij 6 si Þ ¼ 1  Rj ðsi Þ;

where Tij is the ‘‘failure time’’ or ‘‘shelf life time’’ of the jth unit evaluated at si and Rj (t) = P (Tij > t) is the
reliability function (Nelson, 1990).
The likelihood function takes then the general form:

k Y
Y ni
y 1y ij
k Y
Y ni
y 1y
LðpÞ ¼ pijij ð1  pij Þ ¼ ½1  Rj ðsi Þ ij  ½Rj ðsi Þ ij ; ð1Þ
i¼1 j¼1 i¼1 j¼1

t
where p ¼ ðp11 . . . pk;nk Þ is a vector of dimension N · 1 (N = k nk).
Freitas et al. (2003) assumed that the failure time Tij of the jth unit evaluated at time si (fixed) followed a
Weibull distribution, with density and reliability functions given respectively by
d
fj ðtÞ ¼ adj dtd1 expððaj tÞ Þ ð2Þ
and
d
Rj ðtÞ ¼ expððtaj Þ Þ; ð3Þ
with parameters

• aj = exp{Xjb} = exp{b0 + Xj1b1 + Xj2b2 +    + Xjqbq},


• d = exp(c), cP0,

and Xj = (1, Xj1, . . . , Xjq) a 1 · (q + 1) vector of explanatory variables (covariates or experimental factors)
associated to the jth unit evaluated at si (it was assumed that Xj is measured with no error) and b = (b0
b1 . . . bq)t is a vector 1 · (q + 1) of parameters associated to the variables Xj.
Therefore, using expression (3) in (1), the likelihood function takes the form:
ni h n  
Yk Y
 c oi1y ij h
X jb e
n   c oiy ij
X jb e
LðkÞ ¼ exp  si e 1  exp  si e ; ð4Þ
i¼1 j¼1

where k = (bt; c), is a (q + 2) · 1 vector of parameters.


t

The parameters kt = (bt; c) were estimated by the Maximum Likelihood Method (McCullagh & Nelder,
1989), through direct maximization of the logarithm of expression (4).
The parameter estimates were then used to calculate maximum likelihood estimates of percentiles of the
failure time distribution for each attribute and fraction of ‘‘defectives’’ at several points in time. Practical con-
siderations regarding the suitability of the Weibull distribution to describe the shelf life time of food products
can be found in Freitas et al. (2003).
In this work we generalize this model allowing also the form parameter d of the Weibull distribution to
depend on explanatory variables. In other words, here we will be working with the assumption that the failure
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 657

time of the jth unit evaluated at time (fixed) si follows a Weibull distribution whose probability density func-
tion at time t is given by the expression:
d d
fj ðtÞ ¼ aj j dj tdj 1 expððaj tÞ j Þ; t > 0 ð5Þ
with parameters aj (scale) and dj (form) given by:

• aj = exp{Xjb} = exp{b0 + Xj1b1 +    + Xjqbq};


Xj = (1, Xj1, . . . , Xjq) is a 1 · (q + 1) vector of explanatory variables (covariates or experimental factors,
measured with no error) related to the jth unit evaluated at si (j = 1, 2, . . . , ni; i = 1, 2, . . . , k); b = (b0
b1 . . . bq)t is a (q + 1) · 1 vector of parameters associated to the explanatory variables;
• dj = exp{Wjh} = exp{h0 + Wj1h1 +    + Wjqhr};
Wj = (1, Wj1, . . . , Wjr) is a 1 · (r + 1) vector of explanatory variables also related to the jth unit evaluated
at si (j = 1, 2, . . . , ni; i = 1, 2, . . . , k) (measured with no error) and
h = (h0h1 . . . hr)t is a (r + 1) · 1 vector of parameters associated to the variables Wj.

Moreover, the reliability function evaluated at time t is now given by:


d
Rj ðtÞ ¼ expððaj tÞ j Þ: ð6Þ
Therefore, the likelihood function, Eq. (1), now takes the general form:
(      )
Yk Y ni
 X j b eW j h 1y ij  X j b eW j h y ij
LðkÞ ¼ exp  si e 1  exp  si e ; ð7Þ
i¼1 j¼1

where kt = (bt; ht).


The maximum likelihood estimate of k ð^ kÞ is obtained by direct maximization of the logarithm of expres-
sion (7).The calculations require the implementation of numeric optimization methods such as the Newton–
Raphson algorithm.
In this work, a minor adjustment to the Newton–Raphson procedure sometimes used in statistical prob-
lems, called Fisher’s Score was used (Seber & Wild, 1989). This adjustment is done by replacing the negative
second-derivative matrix (H) by its expected value I (k) = E{H(k)}, called the (expected) Fisher Informa-
tion Matrix. This modification is implemented since the negative second-derivative matrix may not be positive
definite at every practical solution ^ kðjÞ and this can cause the Newton method to fail. The Fisher scoring
method therefore uses an approximation of H (k) which is always positive definite, so that the step taken
at the (j  1)th iteration leads off in an uphill direction (Seber & Wild, 1989). The expressions of the quantities
needed for this calculation are shown in the Appendix A.
Now, if ^ ^0 ; b
^1 ; . . . ; b
^q ; ^
h0 ; ^
h1 ; . . . ; ^
t
k ¼ ðb hr Þ is the maximum likelihood estimator of k = (b0, b1, . . . , bq; h0,
h1, . . . , hr)t then for a given set of explanatory variables Xj = (1, Xj1, . . . , Xjq) and Wj = (1, Wj1, . . . , Wjr),
we get:

• the maximum likelihood estimator of the (100 · p)% percentile (or equivalently, the pth percentile) of the
failure time (or, shelf life time) distribution, ^tpðjÞ is given by

1 ^
^tpðjÞ ¼ ½ lnð1  pÞ1=dj ; ð8Þ
^j
a
^ and d^j ¼ expfW j ^
where, a^j ¼ expfX j bg hg;
• the maximum likelihood estimator of Fj(t0), the fraction of ‘‘defectives’’ (fraction of units considered
‘‘inadequate for consumption’’) in t0 is given by:
( )
n  ^ o  W j ^h
dj ^ e
^ ^
F j ðt0 Þ ¼ 1  Rj ðt0 Þ ¼ 1  exp  t0 a^j ¼ 1  exp  t0 e X jb
: ð9Þ
658 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

Making use of the asymptotic property of the maximum likelihood estimator it is possible to get the expres-
sions of the (asymptotic) standard deviations and construct confidence intervals for the quantities mentioned
above. The expressions and details of the calculations are in the Appendix A.

4. Simulation study

Our purpose was to study the ‘‘quality’’ of the estimates of percentiles and fraction of ‘‘defectives’’ obtained
with the expanded Weibull model presented in Section 4 applied to situations which imitated the behavior fol-
lowed by the real data available. The basic idea was to generate observations under known conditions and use
the model proposed to estimate the quantities of interest. These estimates were then compared to the real val-
ues. The details of the study are given next, including the definition of the metrics used to evaluate the ‘‘qual-
ity’’ of the estimates obtained.

4.1. Description of the simulation study

4.1.1. The choice of the (vectors of) explanatory variables Xj and Wj


To implement the simulations we envisioned a scenario where two storage conditions were under investi-
gation. This choice was motivated by a number of real situations associated to shelf life determination of food
products, in particular the one described in Section 2. In that case two environmental chambers were used in
an attempt to investigate two (different) aggressive storage conditions.
By using the model proposed, we assumed that the failure time (shelf life time) Tij of the jth unit evaluated
at time si (i = 1, 2, . . . , k; j = 1, 2, . . . , ni) was distributed as a Weibull (aj; dj) (notation: Tij  Weibull (aj; dj)),
with parameters:

• aj = exp(Xjb) = exp (b0 + b1Xj1), that is, Xj = (1,Xj1);


• dj = exp(Wjh) = exp (h0 + h1Wj1), that is, Wj = (1,Wj1),

where

1 if the jth unit evaluated at si was stored under the condition 1
X j1 ¼ W j1 ¼
0 if the jth unit evaluated at si was stored under the condition 2

Consequently,

Weibullðeb0 ; eh0 Þ for units stored under the condition 1
T ij  Weibullðaj ; dj Þ ¼
Weibullðeðb0 þb1 Þ ; eðh0 þh1 Þ Þ for units stored under the condition 2

4.1.2. Performance or ‘‘Quality’’ measures


The expressions listed below were used as measures of ‘‘quality’’ or ‘‘performance’’. In all of them, M denotes
the number of samples generated; u is the real value of the quantity being estimated (for instance, a given percen-
tile or fraction of ‘‘defectives’’) and ^
ui denotes the estimated value of u, calculated for the ith sample.

• Estimated value of u (based on the M samples):


PM
i¼1 ^
ui

 ;
M

• Standard Deviation (SD):


sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PM 2
i¼1 ð^ ui   uÞ
SD ¼ ;
M 1
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 659

• Bias (B):
B¼
u  u;
• Relative Bias (RB):
j
u  uj jBj
RB ¼  100% ¼  100%;
u u
• Mean Square Error (MSE):
MSE ¼ ðSDÞ2 þ B2 :

4.1.3. The choice of the parameter values b0, b1, h0 and h1 (and consequently, aj and dj) used in the simulation
Table 1 presents the five cases considered. The values of b0, b1, h0 and h1 chosen were such that the respec-
tive aj and dj values used by Freitas et al. (2003) could be reproduced here. In that case, only the scale param-
eter aj was modeled as a function of explanatory variables; the form parameter remained constant for different
storage conditions. The range of values used by them in the simulations was based on the estimated values
obtained for the real data set.
It is important to emphasize that for each case listed in Table 1, the parameters values were chosen in an
attempt to make the degradation mechanism represented by the condition 2 (Xj1 = Wj1 = 1) more ‘‘aggres-
sive’’ than the one for condition 1. That is why (except for case 5), dcond. 2 > dcond. 1.

4.1.4. Sample plans implemented in the simulation study


By ‘‘sample plan’’ we mean: the number of weeks of follow-up (nw); the number of panelists allocated to
each week (np) and the total number of product units under test (N = nw · np). Table 2 summarizes the sample
plans used in this simulation study. Note that the sample plans listed Table 2 include the plans implemented in
the real situation described in Section 2 (Plans III and V).

4.1.5. Steps followed in the simulation study


In the proposed model, the underlying distribution of the failure time (or shelf life time) is a Weibull
but in a real situation, one does not observe the actual failure time. In fact the information available is

Table 1
Parameter values used in the simulation study
Cases Parameter Condition 1 Condition 2
(Xj1 = Wj1 = 0) (Xj1 = Wj1 = 1)
b0 b1 h0 h1 aj dj aj dj
(1) 3.35 0.62 0.0 0.7 0.035 1.0 0.065 2.0
(2) 3.35 0.62 0.18 0.7 0.035 1.2 0.065 2.4
(3) 3.00 0.62 0.0 0.7 0.050 1.0 0.093 2.0
(4) 3.00 0.62 0.18 0.7 0.050 1.2 0.093 2.4
(5) 3.35 0.62 0.0 0.0 0.035 1.0 0.065 1.0

Table 2
Sample plans considered for each condition
Plan Number of weeks (ns) Number of panelists (np) N = ns · np
I 12 7 84
II 12 14 168
III 18 7 126
IV 18 14 252
V 36 7 252
VI 36 14 504
660 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

the score assigned by a panelist to a given product unit. In the model proposed in Section 4, the results
are dichotomized according to the cut-off point established by the company. In other words, the result is
either zero or one depending on the failure time (Tij) being located before or after the evaluation time (si).
In the simulation we assumed that the evaluations were implemented weekly. In addition, the total follow-
up time was nw weeks and np panelists were requested weekly to compose the laboratory panel. The main steps
followed are given below:

• Step 1: Choose a set of parameter values b0, b1 (aj) and h0, h1 (dj) listed in Table 1 and a sample plan from
Table 2.
• Step 2: Generate a vector of ‘‘evaluation times’’ (weeks) taking into account the sample plan chosen in step
1.
• Step 3: Generate a random sample of size N = nw · np of failure times from a Weibull (aj; dj), where
Xj1 = Wj1 = 0, that is, aj ¼ eb0 and dj ¼ eh0 . Use the parameter values b0 and h0 chosen in step 1.
• Step 4: Generate a random sample of size N = nw · np of failure times from a Weibull (aj; dj), where
Xj1 = Wj1 = 1 in other words, aj ¼ eb0 þb1 and dj ¼ eh0 þh1 . Use the parameter values b0; b1; h0 and h1 chosen
in step 1.
• Step 5: Dichotomize the 2N results obtained in steps 3 and 4 and store them in a vector Y2N·1. The dichot-
omization is done by comparing each of the 2N failure times t with the evaluations times generated in step 2
(s). If t > s then Y = 0, otherwise Y = 1 (meaning that failure has already occurred).
• Step 6: Calculate the maximum likelihood estimator of b0, b1, h0, h1 (and consequently aj and dj) replacing
‘‘yij’’ in expression (10) by the dichotomized data.
• Step 7: Find the estimates of percentiles and fraction of defectives, using the parameter estimates calculated
in step 6.
• Step 8: Store the values calculated in step 6.
• Step 9: Generate another pair of samples as in steps 3 and 4 and repeat steps 5–8.
• Step 10: Steps 3–8 should be repeated until M = 1000 samples of each condition (Xj1 = Wj1 = 0 and
Xj1 = Wj1 = 1) have been generated. Then based on the M = 1000 samples, calculate for each condition
and for each quantity of interest u (fraction defectives and percentiles):
– the estimated value of u, that is  u based on the M = 1000 values ð^ui Þ estimated (i=1, 2, . . . , M);
– the standard deviation (SD) based on the 1000 values ^ui ;
– the bias (B), the relative bias (RB) and the mean square error (MSE).

These steps were implemented for each one of the sample plans listed in Table 2.

4.2. Simulation results

Although several ‘‘quality’’ measures have been calculated, we decided to present the results only for the
mean square error since this measure summarizes both, bias and precision of each estimate. We will be dis-
cussing the results related to the percentile estimates only, since these are the quantities we are most interested
on.

4.2.1. Results related to the percentile estimates (^tp )


Figs. 1 and 2 show the plot of the mean square error (MSE) of ^tp vs. ln(p), for six values of p (106; 105;
10 ; 103; 102; 101). These plots were constructed for each case (see Table 1) and sample plan (see Table 2)
4

considered in the simulations.


Some important aspects can be pointed out:

1. The MSE values increase along with the probability values p. This general pattern is found not only in
all the sample plans considered but also in all the cases within a given sample plan. In other words, the
overall quality of the estimates gets worth with the increase of the percentile value. This is an attractive
characteristic especially regarding shelf life estimation since in this kind of problem one is always interested
in small percentiles;
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 661

a Plan I– Condition 1 b Plan I– Condition 2


nw=12; np=7; N=84 nw=12; np=7; N=84

3.0
3.0
Case 1 Case 1
Case 2 Case 2

2.5
Case 3

2.5
Case 3
Case 4 Case 4
Case 5 Case 5

2.0
2.0

1.5
1.5

MSE
MSE

1.0
1.0

0.5
0.5

0.0
0.0

–6 –5 –4 –3 –2 –1 –6 –5 –4 –3 –2 –1
lnp lnp

c Plan II– Condition 1 d Plan II– Condition 2


nw=12; np=14; N=168 nw=12; np=14; N=168

3.0
3.0

Case 1 Case 1
Case 2 Case 2
Case 3 Case 3
2.5
2.5

Case 4 Case 4
Case 5 Case 5
2.0
2.0

1.5
1.5

MSE
MSE

1.0
1.0

0.5
0.5

0.0
0.0

–6 –5 –4 –3 –2 –1 –6 –5 –4 –3 –2 –1
lnp lnp

e Plan III– Condition 1 f Plan III– Condition 2


nw=18; np=7; N=126 nw=18; np=7; N=126
3.0
3.0

Case 1 Case 1
Case 2 Case 2
Case 3 Case 3
2.5
2.5

Case 4 Case 4
Case 5 Case 5
2.0
2.0

1.5
1.5

MSE
MSE

1.0
1.0

0.5
0.5

0.0
0.0

–6 –5 –4 –3 –2 –1 –6 –5 –4 –3 –2 –1
lnp lnp

Fig. 1. MSE values vs. ln(p) – Plans I–III (all cases).


662 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

a 1 b 2
nw=18; np=14; N=252 nw=18; np=14; N=252

3.0

3.0
Case 1 Case 1
Case 2 Case 2
Case 3 Case 3

2.5

2.5
Case 4 Case 4
Case 5 Case 5

2.0

2.0
1.5
MSE

1.5
MSE
1.0

1.0
0.5

0.5
0.0

0.0
lnp lnp

c 1 d 2
nw=36; np=7; N=252 nw=36; np=7; N=252
3.0

Case 1 3.0 Case 1


Case 2 Case 2
Case 3 Case 3
2.5

2.5

Case 4 Case 4
Case 5 Case 5
2.0

2.0
1.5
MSE

1.5
MSE
1.0

1.0
0.5

0.5
0.0

0.0

lnp lnp

e 1 f 2
nw=36; np=14; N=504 nw=36; np=14; N=504
3.0

3.0

Case 1 Case 1
Case 2 Case 2
Case 3 Case 3
2.5

2.5

Case 4 Case 4
Case 5 Case 5
2.0

2.0
1.5

1.5
MSE

MSE
1.0

1.0
0.5

0.5
0.0

0.0

lnp lnp

Fig. 2. MSE values vs. ln(p) – Plans IV–VI (all cases).


M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 663

2. For both conditions, the MSE values decrease when one goes from plan I to plan VI. This pattern was
already expected since the sample size (N) increases in this direction;
3. In the real situation described in Section 2, it was mentioned that for the environmental chamber 2, the
follow up time was 18 weeks and that about seven panelists were allocated to each evaluation date.
Fig. 1f shows the simulation results obtained for condition 2 (assumed to be more aggressive than condition
1 and ‘‘similar’’ to the environmental chamber 2), under the same sample plan used in the real situation. It
was already mentioned that an increase in sample size leads to a reduction in the MSE. Comparing Figs. 1d
and f, we note that with an increase of about 33% in the sample size N (from N = 126 to N = 168) it is pos-
sible to get much better estimates for the percentiles. In fact, Fig. 1d shows that these better results can be
obtained by increasing the number of panelists allocated to each week and reducing the follow up time (in
this case, 18–12 weeks). On the other hand, if one decides to double the sample size, from N = 126 (Fig. 1f)
to N = 252, Figs. 2b and d show that it does not make any difference if this increase in sample size is
obtained by increasing the number of panelists (np) per week and decreasing the follow up time (Fig. 2b)
or by increasing the follow up time and keeping the seven panelists per week (Fig. 2d);
4. Finally, if we look at Figs. 2a and c, we see that for condition 1, if one is interested in percentiles associated
with p 6 103 then, Plan V (N = 252; nw = 36; used in the real experiment) and Plan IV (N = 252 but
nw = 18) provide estimates with similar MSE.

5. Application

The analysis of the example introduced in Section 2 consisted of two parts:

1. We have concentrated on the data of the two storage conditions ‘‘environmental chamber 1’’ (30 C; 80%
relative humidity) and ‘‘environmental chamber 2’’ (37 C). The experimental data were analyzed by fitting
the model proposed in Section 3 to each attribute separately. Percentiles and fraction of defectives were
then estimated and the shelf life for each attribute and storage condition was characterized by chosen
percentiles;
2. The results obtained were then compared with the ones obtained by Freitas et al. (2003). The comparison
was done in terms of the percentiles estimates obtained by each one of the two models.

The first step of the analysis was the dichotomization of the data (scores). As it was already mentioned, the
food company established the score ‘‘3’’ as the cut-off point.
It was assumed that Tij, the failure time (shelf life time) of the jth unit evaluated at si (fixed) followed a
Weibull (aj; dj) distribution (dj P 1) such that:

• aj = exp (b0 + b1X1j) and


• dj = exp (h0 + h1X1j), for all j = 1, 2, . . . , ni and i = 1, 2, . . . , k with k = (18 + 36) weeks (Chambers 1 and 2)
and,

0 if the jth unit evaluated at si was stored in Chamber 1
X j1 ¼
1 if the jth unit evaluated at si was stored in Chamber 2
Here, kt = (b0; b1; h0; h1).

Table 3 presents the results of the model fitting. The values in parenthesis below the parameter estimates,
are the (asymptotic) standard deviations. The same table presents the Wald Statistic (Cox & Hinkley, 1974),
one for each attribute, along with the respective p-values related to the hypothesis testing:

• H0: the failure time (shelf life time) of the units stored under the two conditions can be modeled by the same
Weibull distribution (same form and scale parameters).
• H1: the failure times can be modeled by different Weibull distributions.
664 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

Table 3
Parameter estimates and p-values of the hypothesis testing: H0: b1 = 0 and h1 = 0 vs. H1: b1 5 0 or h1 5 0 or {b1 5 0 and h 5 0}
Attributes Estimates Wald Statistic
b^ 0 b^1 h^0 h^1
Odor 3.5794 0.8802 0.2704 0.7147 72090,64
(0.0079)a (0.0086) (0.0109) (0.0156) (p = 0.0)
Flavor 3.3386 0.6424 0.2773 0.0466 918.82
(0.0193) (0.0304) (0.0340) (0.0564) p = (0.0)
Appearance 3.8312 1.0619 0.5620 0.1402 14115.45
(0.0203) (0.0227) (0.0247) (0.0346) p = (0.0)
a
Standard deviation.

In other words:

• H0: b1 = 0 and h1 = 0 vs.


• H1: b1 5 0 or h1 5 0 or (b1 5 0 and h1 5 0)

There is a strong evidence against the null hypothesis. In other words, it seems that the degradation mech-
anism induced by the two storage conditions (Chambers 1 and 2) can be modeled by different Weibull distri-
butions. The next step is then to investigate the nature of this difference since:

• if b1 = 0 then for Chamber 1 (X1j = 0), a = exp (b0) and d = exp (h0) while for Chamber 2 (Xij = 1),
a = exp (b0) and d = exp (h0 + h1);
• if h1 = 0 then for Chamber 1 (X1j = 0), a = exp (b0) and d = exp (h0) while for Chamber 2 (Xij = 1),
a = exp (b0 + b1) and d = exp (h0).

It is important to recall that Freitas et al. (2003), analyzed the same data set. The model implemented was
more restrictive though. The form parameter d was not only considered fixed (i.e. was not dependent on
explanatory variables) but also assumed to be the same for both storage conditions. This is exactly the second
scenario listed above. Therefore, the model proposed here includes the situation analyzed by the above men-
tioned authors as a special case.
Table 4 presents the p-values related to the individual hypothesis testing, H0: b1 = 0 vs. H1: b1 5 0 and
H0:h1 = 0 vs. H1:h1 5 0. Those tests were constructed applying the asymptotic normality of the maximum
likelihood estimators. The right-hand side of Table 4 shows the Weibull parameters estimates for each storage
condition. Those values were calculated taking into account the results of the hypothesis testing. For example,
the null hypothesis H0:h1 = 0 was not rejected for the attribute ‘‘flavor’’ (p = 0.4065). Therefore, we can

Table 4
Hypothesis testing results and parameter estimates (a and d) for Chamber 1 (X1j = 0) and Chamber 2 (X1j = 1)
Attributes Estimates Chamber 1 Chamber 2
b^
0 b^1 h^0 h^1 a^j d^j a^j d^j
Odor 3.5794 0.8802 0.2704 0.7147 0.0279 1.3105 0.0672 2.6781
(p = 0)a (p = 0)
Flavor 3.3386 0.6424 0.2773 0.0466 0.0355 1.3196 0.0675 1.3196
(p = 0) (p = 0.4065)
Appearance 3.8312 1.0619 0.5620 0.1402 0.0217 1.7542 0.0627 2.0182
(p = 0) (p = 0.00005)
a
Probability of significance.
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 665

conclude that, for this attribute, the failure time of units stored in the two conditions can be modeled by Wei-
bull distributions with the same form parameter (achamber 1 = 0.0355; achamber 2 = 0.0675; dchamber 1 = dchamber 2 =
1.3196  1.32). The other attributes can be modeled by Weibull with distinct form and scale parameters for
each condition.
Fig. 3 presents the estimated hazard functions for each condition/attribute.
Table 5 presents the estimates for each condition along with the asymptotic 95% confidence intervals. The
results seem to indicate that, except for ‘‘odor’’, the attributes of the units stored in Chamber 2 (37 C) dete-
riorated faster than those of products stored in Chamber 1. So, if the food company decides to use as a shelf
life the 1% percentile, that is, t0.01 then (see Table 5):

Chamber 1 Chamber 2
0.8

0.8
0.6

0.6
Odor Odor
Flavor Flavor
Appearance Appearance
h(t)

h(t)
0.4

0.4
0.2

0.2
0.0

0.0

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
week week

Fig. 3. Estimated hazard function for each storage condition and attribute.

Table 5
Percentiles estimates (^tp weeks)- storage conditions: Chambers 1 and 2
Condition p Attributes
Odor Flavor Appearance
Chamber 1 103 0.18433 0.15023 0.89920
(0.12478; 1.71415)a (0.03512; 0.74136) (0.37381; 3.67951)
102 1.07184 0.86304 3.34985
(0.57190; 6.17721) (0.21146; 3.42215) (1.82611; 8.71465)
0.05 3.71760 2.96792 8.48302
(2.43153; 8.73865) (1.34931; 5.97352) (5.76131; 13.37145)
0.50 27.10620 21.34663 37.42309
(20.35681; 34.12653) (17.60826; 25.01653) (26.83250; 52.58634)
Chamber 2 103 1.12788 0.07901 0.52050
(0.97314; 2.37921) (0.01342; 0.42137) (0.12153; 1.97134)
102 2,66898 0.45394 1.63247
(1.21324; 3.37324) (0.13627; 1,83675) (0.54178; 4.71579)
0.05 4.90491 1.56109 3.66069
(2.17637; 5.23547) (0.77641; 3.14671) (1.97572; 6.61734)
0.50 12.96582 11.22809 13.29828
(10.63524; 15.01381) ( 8.36551;15.25307) (10.58224; 18.34671)
a
95% (asymptotic) confidence interval.
666 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

• for units stored in Chamber 1 (30 C and 80%):


– Odor: 1.07 weeks (7 days) ([0.57 weeks; 6.18 weeks]);
– Flavor: 0.86 weeks (6 days) ([0.21 weeks; 3.42 weeks]);
– Appearance: 3.35 weeks (24 days) ([1.83 weeks; 8.71 weeks]);
• for units stored in Chamber 2 (37 C)
– Odor: 2.67 weeks (18.6 days) ([1.21 weeks; 3.37 weeks]);
– Flavor: 0.45 weeks (3.1 days) ([0.13 weeks; 1.84 weeks]);
– Appearance: 1.63 weeks (11 days) ([0.54 weeks; 4.71 weeks]);

We still have to deal with the problem that the food company needs to fix a unique shelf life for the product
(associated to the sensory attributes). Once a specific parameter of the ‘‘failure time’’ (shelf life time) distribu-
tion has been chosen to be reported as the shelf life, a usual practical approach is to adopt the smallest point
estimate value. In our example, the 1% percentile was chosen to represent the shelf life for each attribute.
Then, using the proposed approach, the value to be used for products stored under the conditions simulated
by chamber 1 is 0.86 weeks (6 days). The construction of a confidence interval for this quantity is not an easy
flavor appearance
task since one needs to obtain first the probability density function of W ¼ minimumftodor 0:01 ; t 0:01 ; t 0:01 g. An
alternative to the analytical calculations is the construction of this distribution empirically by Monte Carlo
simulation. Here a conservative approach is suggested and the interval is given by [0.21 weeks; 8.71 weeks],
where 0.21 = min{0.57;0.21;1.83} (the minimum of the lower bounds values) and 8.71 = max{6.18;
3.42;8.71} (the maximum of the upper bound values).
Similarly, the shelf life value to be reported for units stored under the conditions simulated by chamber 2 is
0.45 weeks with a conservative confidence interval [0.13 weeks; 4.71 weeks].
It is important to emphasize that any other quantity could be used to represent the shelf life. For example,
the median failure time (t0.50) or the mean failure time. Finally, it is interesting to note that for both storage
conditions, the attribute ‘‘flavor’’ was the one with the smallest estimated value, indicating that it deteriorated
faster than the other two attributes.

5.1. Results comparison

Some of the results reported by Freitas et al. (2003) are reproduced in Tables 6 and 7.
Table 6 reports the point estimates of the Weibull parameters, for each storage condition and attribute.
Comparing those values with the ones obtained in this work (Table 4) we note that although our model
allowed both Weibull parameters to vary freely, the point estimates of the scale parameters (a) obtained
through both models are very similar.
In addition, according to the results of our analysis, the failure time of units stored in both conditions
(Chambers 1 and 2) could be modeled by Weibull distributions with the same form parameter (d) only when
‘‘flavor’’ was the attribute under consideration. Therefore, for that particular case, it is fair to say that the
parameters estimates obtained with the model developed by Freitas et al. (2003) should be very close to the
ones obtained with the model developed in this work, since the former is a particular case of the one latter.
In fact, a close look at the parameter values shown in Table 6 confirms this expectation.

Table 6
Parameter estimates obtained by Freitas et al. (2003)
Condition Attribute
Odor Flavor Appearance
^a ^d ^a ^d ^a ^d

Chamber 1 0.0302 1.6 0.0358 1.4 0.0233 2


Chamber 2 0.0596 1.6 0.0659 1.4 0.0602 2
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 667

Table 7
Shelf life estimates (in weeks) based on the two models
Attribute Condition
Chamber 1 Chamber 2
Freitas et al. (2003) GWM Freitas et al. (2003) GWM
Odor 1.9 1.1 1.0 2.7
[0.9; 3.9]a [0.6; 6.2] [0.5; 2.1] [1.2; 3.4]
Flavor 0.9 0.9 0.5 0.5
[0.4; 2.2] [0.2; 3,4] [0.2; 1.2] [0.1; 1.8]
Appearance 4.1 3.3 1.6 1.6
[2.7; 7.5] [1.8; 8.7] [0.8; 3.2] [0.5; 4.7]
a
95% (asymptotic) confidence interval.

Table 7, presents the shelf life estimates (based on the 1% percentile) obtained by the two models, along
with the respective 95% (asymptotic) confidence intervals (all values rounded to one decimal point). Two
major comments should be made here:

• In addition to the case already mentioned (i.e., the attribute ‘‘flavor’’), the point estimates for the attribute
‘‘appearance’’ (Chamber 2) obtained by both models are also very close.
• The estimates obtained by the ‘‘general’’ model have a lower precision than the ones reported by Freitas
et al. (2003), no matter what case we analyze. This result can be confirmed by comparing the range of
the respective confidence intervals. In fact this result was somehow expected since here we are using a model
with more parameters.

6. Concluding remarks

We have proposed a general version of the modeling approach presented by Freitas et al. (2003), the GWM
(General Weibull Model). Our simulation results suggest that the estimates of percentiles and fraction of
defectives obtained by this model have a good overall ‘‘quality’’ (measured by their MSE). The results also
suggest that, as in many other statistical approaches, the sample size has a major impact on the precision
of the estimates. In fact, MSE values decrease as the sample size increases. In addition, the simulation results
also indicate that, once a sample size is chosen, MSE values are not affected if one modifies the original sample
plan, by fixing a shorter follow up period while increasing the number of panelists allocated to each evaluation
time. The practical implication is: if during the design stage of sensory evaluations, a shorter follow up period
is a key issue for costs reduction, then the overall quality of the estimates can still be maintained by keeping
the same planned sample size through the increase of the number of observations (panelists) per evaluation
time.
In our application to a real data set, the results have shown that, except for one of the cases, an expanded
model was really needed. Also, for all the cases considered in the experiment, the GWM model provided esti-
mates with lower precision than the ones provided by the alternative model. This result was already expected
since we are dealing with a model with more parameters. We have used a simple model including only a dum-
my variable to discriminate between the two storage conditions (chambers 1 and 2). We were not able to
include the specific storage conditions in the model, such as temperature and humidity levels, because they
were fixed for each one of the chambers. The analysis would have been much more informative if the evalu-
ations had been implemented following a factorial design for example (Montgomery, 2001). In that case, not
only model parameters related to the temperature and humidity levels could have been included in the model
but also, parameters related to the effect of the interaction between these two factors.
In practical situations, many other variables are usually measured in shelf life experiments and may bring a
new insight into the study if included in the modeling. One example is the water activity (aw) measured in each
environment. It is now generally accepted that the water requirements of microorganisms should be defined in
terms of the aw. Just to give an example, most spoilage bacteria do not grow below aw = 0.91, while spoilage
668 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

molds can grow as low as 0.80 (Jay, 1992). Some relationships have been shown to exist between aw values,
temperature and nutrition for example. The water activity can be included as a covariate in the model.
We still have to deal with the problem of reporting a unique shelf life value and constructing a confidence
interval for this quantity. The approach adopted in this paper was a conservative one and did not account for
the possibility of correlations between the failure times of the different attributes which in practice, it is pos-
sible to happen. It would be interesting to study a little bit more this situation. One possibility could be to
model more than one attribute together, in other words, to work with the joint distribution of their time to
failure. This problem is now the subject of further research by us.

Acknowledgements

The authors are grateful to the referees and the editor for helpful comments that led to substantial improve-
ments in the paper and also to CNPQ/Brazil for the financial support of this research.

Appendix A

In this section we present the expressions of the derivatives of the log-likelihood function needed in the
implementation of the Fisher Score algorithm and large sample confidence intervals for fraction of
defectives and percentiles.
Maximum likelihood estimates are obtained by direct maximization of l (k) = ln L (k). The expressions of
the first derivatives are:
  " olðkÞ #
olðkÞ ob
¼ ;
okokt ðqþrþ2Þ1 olðkÞ
oh

where
2 3
  k X
X ni  y eW j h ðsi eX j b ÞeW j h
olðkÞ 6 W jh
ij 7
¼ X tj 4 eW j h ðsi eX j b Þe þ W h 5
ob ðqþ1Þ1 i¼1 j¼1 1e i ðs e X jb e j
Þ

2 3
  X
k X
ni
W h
X jb e j X jb W jh
olðkÞ 6 eW j h y ij ðsi e Þ lnðsi e Þe 7
¼ W tj 4ðsi eX j b Þ lnðsi eX j b ÞeW j h þ  W h 5:
oh ðrþ1Þ1 i¼1 j¼1
X b e j
1  eðsi e j Þ

The elements of the Fisher Information Matrix I(k) are given below
  2  " #
o lðkÞ I 11 I 12
IðkÞ ¼ E  ¼ t :
okokt ðqþrþ2Þðqþrþ2Þ I 12 I 22
W h W h
Xjb e j X jb e j
Since, Eðy ij Þ ¼ 1  eðsi e Þ
and Eð1  y ij Þ ¼ eðsi e Þ then:
2 3
  2  X k X nj 2W j h
W h
X j b 2e j ðsi eX j b Þe
W jh
o lðkÞ 6e ðsi e Þ e 7
I 11 ¼E  ¼ X tj X j 4  W h 5;
obobt ðqþ1Þðqþ1Þ i¼1 j¼1 1e i ðs e X jb e j
Þ

2 3
  2  XXk nj X b 2eW jh
X b 2 2W h ðs e Þ
W h
X jb e j
o lðkÞ 6ðsi e j Þ ðlnðsi e j ÞÞ e j e i 7
I 22 ¼E  t ¼ W tj W j 4  W h 5;
ohoh ðrþ1Þðrþ1Þ i¼1 j¼1 1e ðs i e Xjb e j
Þ

2 3
  2  X k X nj W h
X j b 2e j X j b 2W j h ðsi eX j b Þe
W jh
o lðkÞ 6ðsi e Þ lnðsi e Þe e 7
I 12 ¼E  t ¼ X tj W j 4  W jh 5:
oboh ðqþ1Þðrþ1Þ i¼1 j¼1 1  eðsi e Þ
X j b e
M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 669

Now, if ^ ^0 ; b
^1 ; . . . ; b
^q ; ^
h0 ; ^
h1 ; . . . ; ^
t
k ¼ ðb hr Þ is the maximum likelihood estimator of k = (b0, b1, . . . , bq; h0,
t
h1, . . ., hr) then:

• using maximum likelihood large sample theory (asymptotic normality) and the delta method (see Cox &
Hinkley, 1974), it is possible to find expressions of the asymptotic variance estimator

^ ^tpðjÞ Þ ¼ Z t I 1 ðkÞZj ^ ;
Varð k¼k

where Z is a vector of dimension (q + r + 2) · 1, given by


2 1 3
eX j b ½ lnð1  pÞeðW j hÞ X tj
Z¼4 1
5: ð10Þ
X j b ðW hÞ W j h t
e ½ lnð1  pÞ e j ½e lnð lnð1  pÞÞW j

Therefore, the upper (UB) and lower bounds (LB) of a 95% (asymptotic) confidence interval for tp(j) are
given respectively by:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
UB ¼ ^tpðjÞ þ 1:96 ^ ^tpðjÞ Þ ;
Varð
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
LB ¼ ^tpðjÞ  1:96 ^ ^tpðjÞ Þ ;
Varð

• equivalently, making use of the delta method and the asymptotic normality property of the maximum like-
lihood estimators (Cox & Hinkley, 1974) we get the expression of a 95% confidence interval for the fraction
of defectives at t0, Fj (t0) = 1  Rj(t0):
pffiffiffiffiffiffiffiffiffiffi
^ g
V^ arð/Þ
^ j ðt0 Þgexpf1:96
UB ¼ 1  fR ;
pffiffiffiffiffiffiffiffiffiffi
expf1:96 ^ g
V^ arð/Þ
^ j ðt0 Þg
LB ¼ 1  fR :
  
W ^h
where /^ ¼ ln½ð ln R^ j ðt0 ÞÞ ¼ ln  ln exp ðt0 eX j b^ Þe j ^ ¼ Z t I 1 ðkÞZj ^ , and Z is a vector
; V^ arð/Þ k¼k
(q + r + 2) · 1 given by:
" o/ # " #
ob
eW j h X tj
Z ¼ o/ ¼ W h : ð11Þ
oh
e j lnðt0 eX j b ÞW tj
We point out that the bounds (UB and LB) were calculated applying the asymptotic normal distri-
bution to the transformation / = ln(ln Rj(t0)) for which the range is unrestricted. Then, the confi-
dence interval for the fraction of defectives is found applying the inverse transformation. This
procedures suggested by Kalbfleisch and Prentice (1992, p. 18) prevents the occurrence of limits out-
side the range [0,1].

References

Cox, D. R., & Hinkley, D. U. (1974). Theoretical statistics. London: Chapman and Hall.
Draper, N., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). New York: Wiley.
Freitas, M. A., Borges, W. S., & Ho, L. L. (2003). A statistical model for shelf life estimation using sensory evaluation scores.
Communications in Statistics – Theory and Methods, 32(8), 1559–1589.
Freitas, M. A., Borges, W., & Ho, L. L. (2004). Sample plans comparisons for shelf life estimation using sensory evaluation scores.
International Journal of Quality & Reliability Management, 21(4), 439–466.
Gacula, M. C. Jr., (1975). The design of experiments for shelf life study. Journal of Food Science, 40, 399–403.
Gacula, M. C., Jr., & Kubala, J. J. (1975). Statistical models for shelf life failures. Journal of Food Science, 40, 404–409.
Gacula, M. C., Jr., & Singh, J. (1984). Statistical methods in food and consumer research. New York: Academic Press.
670 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

Jay, J. M. (1992). Modern food micro-biology (4th ed.). New York: Chapman & Hall.
Kalbfleisch, J. D., & Prentice, R. L. (1992). The statistical analysis of failure time data. New York: Wiley.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. New York: Chapman and Hall.
Meeker, W. Q., & Escobar, L. A. (1998). Statistical methods for reliability data. New York: Wiley.
Montgomery, D. C. (2001). Design and analysis of experiments (5th ed.). New York: Wiley.
Nelson, W. (1990). Accelerated Testing: Statistical Methods, Test Plans and Data Analysis. New York: Wiley.
Seber, G. A. G., & Wild, C. J. (1989). Nonlinear Regression. New York: Wiley.

You might also like