You are on page 1of 15
‘Protection Ecology, 2 (1980) 169=176 159 Prmarier Sclentifie Fubliahing Company, Ameterdam — Printed in The Netherlands QUANTIFICATION OF DISEASE PROGRESSION LV. MADDEN Department of Plant Pathology, The Pennsyloania State University, University Park, PA 16802 (US.A.) (Accepted 17 June 1980) ‘A portion of this paper was presented at the Epidemiology Workshop held at The J ccayivania State Univenity (University Park, PA) and the Plant Disease Research Laboratory (Frederick, MD) from 30 July to $ August 1979. ABSTRACT ‘Madden, L-V., 1980. Quantification of disease progression. Prot. Ecol, 2: 169-176. ‘This paper contains a review and synthesis of: (1) models used for describing epidemita and (2) mathermatial and statistical techniques for analyzing data character TEE plant diaease epidemics. The disease progress models, which for the most Par vat Pinted in the Held of “growth curve analysis”, include: exponential, monomolecular, Gitte, Gomperts, Bertalantfy Richards, and Weibull, It is shown how these modes can PEtouibined into ‘an overall and realistic “generic” epidemic model. Implications of Piveral statistical techniques for estimating model parameters also are discussed. INTRODUCTION ‘An epidemic can be defined as “‘an increase in disease with time”” or more generally as ‘‘a change in disease with time”. It is a dynamic process. ‘The fundamental depiction of an epidemic is the disease progress curve (DPC) a plot of disease proportion vs time. The DPC and its various ele- ments express the interaction of pathogen, suscept, and environment over time. ‘Quantification of disease progression is desirable for numerous reasons in- cluding: (1) evaluating control strategies; (2) predicting future levels of dis- ‘ease; and (8) verifying plant disease simulators and forecasters. Barratt (1948), Large (1952), and Schmitt et al. (1959) were some of the early plant pathologists who quantified disease progression. The greatest advance: nent in the analysis of epidemics occurred, however, with the publication of “Plant Diseases: Epidemics and Control” by Van der Plank (1963). Since $1963 several extensions to this field have been made and the following refer- ‘ences are good reviews: Jowett et al. (1974), Kranz (1974), Hau and Kranz (1977); Starkey (1977), and Zadoks and Schein (1979). 10878-4389/80/0000—0000/802.50 © 1980 Elsevier Scientific Publishing Compa 160 Analysis of epidemics, and their representative DPC’s, is one segment of a field called “growth curve analysis”. Here growth is defined as the change in magnitude of any measurable characteristic, e.g., weight, ‘numbers, and disease proportion. Growth curve analysis has a much longer history than botanical epidemic analysis. Its beginnings can be traced back to 1978 when Malthus made his famous prediction of human population growth and to 1838 when Verhulst proposed the “logistic” form of growth (Malthus, 1798; Verhulst, 1838). There are two distinct strands in the growth curve literature (Sandland and McGilchrist, 1979). The first is the “biological” strand, which seeks models with a biological basis. A model is defined as a simplified representa. tion of reality. Deterministic differential equations are common types of biological growth models. The second strand — the “statistical” strand — in- volves the use of regression equations, often polynomials, which are analyzed without clear-cut biological interpretations. A review and evalua- tion of the two types of growth curve models as they relate to disease progression follow. “BIOLOGICAL” MODELS Dynamic processes are defined by their rate of change with time. An epidemic can, thus, be expressed as ‘dy/dt’ — the change in disease (‘dy’) ‘with infinitesimal change in time (‘dt’). This expression represents the absolute rate of disease increase or absolute growth rate. Epidemiologists wish to quantitatively describe epidemics and their corresponding DPC’s. ‘This is performed by mathematically modeling disease progress. The models can be written in the completely general case as: dy/dt = f (time; random variables; parameters). qa) ‘This can be read as: the change in disease with change in time is a function of time, random variables, and parameters. Random variables are quantities that are measured or observed, e.¢., environment, disease, and crop characteristics. Parameters are unknown constants that must be estimated with available data, This general equation is the fundamental starting ground for all future analysis. 1 will now consider the case where the only random variable is the proportion of disease (severity or incidence) and the only parameter is rate constant. Mathematically, this can be written as: dy/at = f(t 957) @) where time in appropriate units, disease proportion (0.0 to 1.0), and rate parameter. Before considering specific models, I must state that even the most flexible 161 form of eq. (2) will not be suited for all epidemics. Three specific cases of eq. (2) have gained popularity among plant pathologists ever since they ‘were suggested by Van der Plank (1963). Exponential ‘The first of the three models is the exponential, also called logarithmic and geometric. This model was used implicitely by Malthus (1798) to pre- dict human population increase. It can be written as: feynay or dy/dt = ry. @) ‘This is one of the simplest models of disease progression that is useful for analysis. According to eq. (8), the rate of disease- increase is proportional to the level of disease actually present; ‘dy/dz’ equals ‘r’ times ‘y’. In practice, cumulative numbers of infected plants or lesion areas are ob- served, and only indirectly is the rate known. This model, as well as the ‘ones that follow, therefore, is usually presented in its integrated form. An important point to be made is that the differential equation (‘dy/d¢’) is the starting point. Integrating eq. (8) yields: yay, @) ‘y,’ is a constant of integration and can represent the initial level of disease. ‘Van der Plank (1963) made the analogy between continuous compound interest on money in the bank and the exponential model for disease progression. Blackman (1919) was the first person to make the analogy between growth (plant dry matter accumulation in this case) and accumula- tion of money at compound interest. It is interesting to note that Blackman was an early teacher of Van der Plank (1976). The exponential model is ap- propriate when growth is not restricted by adverse factors; Van der Plank (1968) stated that this model is most appropriate in the early stages of poly- ‘cyclic epidemics when ‘y’ is arbitrarily less than 0.05. Methods for evaluat- ing this and other models are discussed in a later section. ‘There are three kinds of plots that are valuable in studying disease progression models. The first is a plot of ‘dy/dt’ vs time (Fig. 1). This plot unfortunately is not made by most investigators. The plot is important be- catise it summarizes the dynamic nature of an epidemic. Since rates are not directly observed, it is necessary to estimate the quantity ‘dy/dt’. The simplest estimate is delta-y — the change in ‘y’ from times G-T tov. ‘There is a monotonic increase in the rate over time with the exponential model (Fig. 1). There is no inflection point and no maximum rate. ‘The second plot is the traditional disease progress curve. This is the integral of ‘dy/dt’ or the summation of the delta-y’s. There is no limit to Disease) eee ae TIME Fig, 1. Rate of disease increase (- - ~) and disease proportion (—) versus time for the exponential model (eqs. (3) and (4). Fig. 2. Rate of disease increase (- ~~) and disease proportion (—) versus time for the ‘mononuclear model (eqs. (6) and (7)). disease with the exponential model (Fig. 1). Although this is unrealistic, the model may be satisfactory for limited sections of epidemic. The third plot is produced by plotting a transformation of ‘y’ (y*) vs jie that will result in a straight line if the model is correct. Eq. (4) can be linearized to: In) = In (v9) + rt. © ‘This is an equation for a straight line with intercept ‘In (y9)° and slope ‘’, A straight line is desirable for its apparent simplicity, but the investigator must realize that ‘In (y)’ is not the quantity that is increasing in the field, ‘The value of ‘r* between times ‘i - 1° and ‘i’ is: 1 (in (yi) = In (j-4) . Monomolecular ‘The second model has been termed the monomolecular, negative exponen tial, and simple interest (Van der Plank, 1963; Richards, 1969). The model hhas been used to describe a monomolecular chemical reaction of the first order, cell expansion, response of crops to nutrients (using different nota: ion), and numerous other phenomena (Mitscherlich, 1909; Richards, 1969). ‘The model is written as: dy/at = ry). 6) The assumption is made that the maximum level of disease equals 1.0. This equation, and the ones that follow, can be generalized to include a param- eter representing the maximum level of disease. This ig:not done here for simplicity of presentation. According to eq. (6), the absolute rate of disease increase is proportional to the amount of “healthy” (disease-free) tissue. Disease-froe tissue is repre- sented by “(1 ~ y)'. A plot of ‘dy/dt’ vs time has the form of a negative ex: Ponential probability density function (Fig. 2). The fastest rate occur at | i . 163 ‘the beginning of the epidemic; the rate declines as disease level approaches a maximum. Infected plants or infected tissue do not contribute to future infections. According to Van der Plank (1963), this type of disease progres- sion is common with soil-borne ‘pathogens and systemic diseases in a single season.” Integrating eq. (6) yields: ye i-(-ye™. o ‘A plot of ‘y” vs time is concave to the time axis. This ‘equation is linearized to: w G5) © Logiatic ‘the third model I will consider is the logistic. It is by far the most fe portant, not necessarily because of its appropristences, ‘but because it is Peed by many pathologists today. The model originally war used by ‘Verhulst in 1838 to describe human population increase (Verhulst, 1838). etb20 Pearl and Reed independently derived the same function (Pearl and Heed, 1920). Since that time it has found may spplications. Van der Plank (1963) proposed this model for describing polycyclic disease progression. ‘The differential equation is: dy/dt = ry(l- 9) - @) ‘Absolute rate of disease increase is jointly proportional to the lew, of dis- case (9°) and healthy tissue (((1 - )’). Although the rate (fay/at’) in- case CY”) th ome levels of ‘y", it eventually decreases to zero (FE: 9), ‘The Srhestion point is at y = 0.5. Mathematically this is the point where the intcimum rate occurs. The parameter ‘r’ of the logistic model was called ‘the apparent infection rate by Van der Plank (1968). DISEASE) Fig, 8, Rate of dacase increas (~~~) and alcuse proportion (—) emus time for the Togiatic model (eqs. (9) and (30))- TIME Fig. 4. Rate of disease increase for the exponential (~~ -), monomolecular (—), and logistic (—-—) models (eae. (3), (6) and (9)). Integration of eq. (9) leads to: 1 2° Ty GMI 0) Plotting ‘y’ vs time results in the classic S-shaped curve, symmetrical about, (Fig. 3). The curve is asymptotic to 0.0 and 1.0. Bq. (10) can be rized to: m (5) -m( The value of ‘r’ between times ‘i - 1’ and ‘is: \ (» (525) -® RR) - This is the “classic” equation of Van der Plank. A comparison of the three models so far discussed can be made by plot- ting their rates vs time (Fig. 4). In the early part of an epidemic the expo- nential and logistic models are very similar; this is because ‘(1 - y)’ is ap- proximately equal to 1.0. Many of Van der Plank’s subsequent analyses (e.€, doubling time and sanitation ratios) assume that disease increases according to the logistic model and, therefore, according to the exponential when y < 0.05 (Van der Plank, 1968). Only with this model are ‘y’ and ‘r’ epidem- iologically independent. The logistic and monomolecular models are very similar during the last phases of polycyclic epidemics. At this time ‘y’ is ap- proximately equal to 1.0 and, therefore, the logistic expression is dominated by “(1 -y). Disease can increase in ways other than that summarized by the previous three models. This should be apparent by examination of Fig. 4. Many curves could be drawn to represent epidemics. A few of these additional models are discussed next, Yo qa) 165 Gompertz The Gompertz model dates back to the early 1800's. It has found its greatest use in animal population studies (Gompertz, 1825), but recently it has been used to describe cotton hypocotyl elongation (Pegelow et al., 1977) and evaluated for summarizing disease progression (Berger and Mishoe, 1976). The.differential equation is: dy/dt = ry(In (1) - In (y)) (12) = ry(-In (y)) - A plot of ‘dy/dt’ ve time results in an asymmetrical curve (Fig. 5) with an inflection point at y = 1/é = 0.37. This model may be appropriate when the maximum rate occum earlier than with the logistic. Integrating eq. (12) yields: Bet (a3) where ‘B’ is a constant of integration. A plot of ‘y’ vs time is represented by an S-shaped curve that is similar to the logistic (Fig. 5). This should vel RATE) peegeng) Time Hig. 5. Rate of disease increase (- ~~) and disease proportion (—) versus time for the Gompertz model (eq. (12) and (13)). emphasize why the rate plot is necessary for identifying models. Eq. (13) is linearized to: -n ing) = In (B) + rt. a4) Bertalanffy—Richards In 1988 Von Bertalanffy proposed that the change of an organism’ Weight is the result of the counteraction of anabolism and catabolism (Von Bertalanffy, 1988, 1957). The equation of Von Bertalanffy was generalized in 1959 by Richards to accommodate many aspects of growth and also to Provide more flexibility in the shape of the growth curves (Richards, 1959, 168 1969). The model equation is: ry 2 ayjat = —2— (—yt-m), iylat = > (1 = yf (a5) The general model of eq. (2) must now be modified to contain an addi- tional parameter. The parameter ‘m’ determines the shape of the curve; itis limited to values less than 1.0 in the Bertalanffy case, Values of % and ¥ for ‘mm’ are the most common forms of the Bertalanffy model. When m = 0, the equation reduces to the monomolecular; when ‘m’ approaches 1,0 (limit), the equation becomes the Gompertz; and when m = 2, the equation is the logistic. There are, of course, many different shapes. The case when m = 3), is plotted in Fig. 6. The inflection point is at y = 0.296. Some of the mani- festations of eq. (15) are plotted in Fig. 7. The inflection point occurs at y= mNQ—m) Integrating eq. (15) yields: y= (1+ Beryl), (16) ‘The parameter ‘B’ is a constant of integration and equals ‘yg(1-™) _ 1, Example DPC’s corresponding to the curves of Fig. 7 are plotted in Fig, 8, Eq. (16) is linearized to: In (a3) =-n@) +r. an Note that ‘y*" (the transformation on the left-hand side of eq. (17) depends on ‘m’. DISEASE PROPORTION gee TIME Fig, 6. Rate of disoase increase (- ~~) and disease proportion (—) versus time for the Bertalantfy model with m = 4 (eqs, (15) and (16). Fig. 7. Rate of disease increase for the Richards model (eq. (16)) with values of ‘m’ equal to 0(—), ~1(---), 2(---), and 8 (=), Fig. 8. Disease proportion versus time for the Richards model (eq. (16)) with values of ‘mi equal to 0 (—), ~1 ==), 2(---), and 3 (= +) Weibull ‘The Weibull model has been used as a statistical distribution and probabil- ity density function for several years (Weibull, 1951). It is used extensively in reliability and life-testing studies, e.g., time to failure of mechanical de- vices and time to relapse of a certain disease after a specific treatment {human medicine). We proposed this model for describing disease progress curves (Pennypacker et al. 1980). The differential equation can be written dy/dt = f(t; a,b,c) or oa ayjat= © ( Ee *) eit-ayoye (18) oo, When used as a probability density function in reliability studies, oq. (18) represents the density function for the time of failure. In epidemic analysis, the equation summarizes the absolute rate of disease increase. The location parameter ‘a’ represents the eavliest possible occurrence of disease (time of disease onset); ‘b’ is the scale parameter and is inversely Telated to the rate of disease increase; ‘c’ is the shape parameter and indicates the type of die ease increase, All of the curves attributed to the Richards model in Fig. 7 can be numerically described by the Weibull. Integrating eq. (18) yields: ye La etttayny® a9) ‘This equation is linearized. to: 1 yyw In = tb -ajb. 20 (» (=5p)) "=m = ‘This is an equation of a straight line with intercept ‘-a/6” and slope ‘1/b’. ‘The relationship between the Weibull model and the previously pre- sented models was determined by generating data sets for various ‘m’ values of the Richards equation and estimating the Weibull shape parameter (L.V. Madden, unpublished data, 1979). A summary of the comparisons is listed in Table L I feel the Weibull model is superior to the Richards be- ‘cause Weibull parameter estimation is more efficient and large data sets are not required (Pennypacker et al., 1980). A more intuitive way of looking at the Weibull model is attained by rear- ranging eqs. (18) and (19). The revised differential equation is: ) (e-1¥e qa-y). (21) Estimated ‘ 1.00 127 40 170 239 3.65 13.01 ‘The quantity ‘In (1/(1 - y))' is Gregory’s multiple infection transformation, ice, it equals the number of infections per unit area or count (Gregory, 1948). When c = 1.0, ‘(In (1/(1 - y)))"1" equals 1.0 and the equation reduces to the monomolecular with r = c/b. As ‘c’ increases above 1.0, more weight is given to the number of infections in the determination of ‘ay/at’. Generic modet ‘The generalized model of eq. (1), which includes all of the models dis- ‘cussed, can be rewritten as: dy/dt = rdo(0,y)di (9,1) - (22) ‘This can be read as: the absolute rate of disease increase is proportional to (1) a generalized distance function between the origin and disease level, and (2) a generalized distance function between the proportion of disease and the ultimate size. In other words, ‘dy/dt” equals a constant times a function of diseased tissue (do(0,y)) times a function of healthy tissue (d;(y,1)). An epidemic can thus be summarized as the product of two opposing “forces” (1) forces that cause disease to increase (e.g, inherent ability of a pathogen to reproduce and infect, and favorable environment); and (2) forces that limit the rate of disease increase (e.¢., limit to susceptible tissue, unfavor- able environment, and crowding). ‘The distance functions for the models so far discussed are summarized in Table I. TABLE IT (0.9) Exponential Monomolecular Logistic Gompertz. Richards Weibull (aug = yye-PFe peeene 169 ‘Turner and co-workers (Turner et al., 1976) utilized the postulates of eq. (22) and developed a “generic” model of growth. Their generalized dis- tances are power functions. The exact form of the model is not given here ue to its complexity; the model does encompass all of the models dis- ‘cussed in this paper plus providing several new forms (Tumer et al., 1976). Model selection ‘Model selection is not a trivial matter. Comparisons among control strategies, verification of simulators, and predicting disease levels are valid and accurate only when the “best” model is used. If a proper model is selected, a plot of the appropriate ‘y*” (transformed ‘y') vs time will result in straight line if ‘r’ is constant. The implications of using linearized (transformed) equations for estimating parameters is dis- cussed in the next section. Plotting delta-y vs time will indicate the shape of the curve and locate the inflection point. One of the most important ways of choosing an acceptable model is to calculate the differences between the observed y's and the predicted y’s — the residuals — when a specific model is assumed, and then plotting these values vs time. If an ac- ceptable model is chosen, this plot will consist of a random scatter of points (Neter and Wasserman, 1974). This indicates there is no systematic deviation of the data points from the model under consideration. An observ- able pattem to the residuals indicates that an incorrect model is being used and/or the y’s are serially correlated. This latter problem is discussed in the next section, More formal statistical tests are possible for evaluating the appropriate- ness of the model. These tests are not discussed here. A few pathologists have attempted to select the most appropriate model for their disease progression data (Analytis, 1973; Berger and Mishoe, 1976; Hau and Kranz, 1977; Campbell et al., 1980). Based on their results, it is apparent that all polycyclic epidemics are not described by the logistic model. There is a whole range of disease progress curve shapes. “STATISTICAL” MODELS ‘An epidemic can be statistically represented as: i= (ti) + ui. (23) ‘This can be read as: the proportion of disease at the ith time is equal to a function of time plus unexplained variability (‘u,’). The term ‘g(&)’ is used instead of ‘f(#)' to stress that these functions are of different types. ‘uj’ rep- resents all the factors influencing the proportion of disease that are not rep- resented by g(t). The term emphasizes that all models are simplifications and do not represent reality in detail or entirety. The unexplained variabil- ity is often called the “error”, emphasizing the potential error in estimating oe 170 ‘The function ‘g(ti)’ has two general forms. The first form consists of polynomial regression equations of degree ‘p’; the second form consists of the integrated form of one of the models discussed in the previous section, Polynomials ‘The polynomial form of eq. (23) is written as: yi = Bo + Byty + But? +... + Bpt? + uy (24) where the B’s are unknown parameters and all other terms are defined as before. Although the parameters are constants, their estimates are random variables with variances and covariances. The highest power of t seldom ex- ceeds 8 or 4. This class of growth curve models was introduced by Wishart (1938); models of this type have been used extensively since that time. It is a convenient approach because growth curves can be reduced to a discrete number of parameters, and treatments compared with univariate and multi- variate statistics. This approach was recently used by Griggs et al. (1978) to analyze and compare fusiform rust epidemic of slash pine. ‘The usual assumptions of ordinary least-squares regression (OLSR) are that the u’s are normally and independently distributed random variables with mean 0 and constant variance sigma-squared. Because the B’s and ¢ are not random variables, the y’s are normally, independently distributed with mean g(t) and variance sigma-squared (Neter and Wasserman, 1974). Of course, this is not biological reality for any growth process, including dis- ‘ease progression. The proportion of disease at any time can be represented by: Yi Yin + OF ¥i-1) or vie f (aylanar. The level of disease at time ‘i’ is equal to the level of disease at time ‘i -1’ plus the difference. This can be also expressed as the integral (summation) of ‘dy/d¢". The y’s are not independent, indeed, they are highly dependent. Although most investigators understand this dependency, they incorrectly use OLSR techniques to estimate parameters. Estimating parameters with OLSR is incorrect; estimated variances and covariances are wrong. In general, the magnitude of the error will increase as the variance of the data increases. ‘There are several approaches to “eliminate” or “utilize” the correlation structure in an acceptable statistical fashion. Advancements in the growth ‘curve analysis area have been made by Potthoff and Roy (1964), Rao (1965), im Khatri (1966) and Grizzle and Allen (1969). Eq. (24) can be reformulated into the following three types of models: (1) autocorrelation, (2) first- difference, and (3) multivariate growth-curve. ‘The autocorrelation and first-ditference models can be derived from the following relationship for the unexplained variability (‘u’) that holds in many circumstances: uj Puja + 04 (25) where ‘P’ is the autocorrelation parameter (0 < P < 1) and ‘v;’ is anew er- tor term that is distributed independently. Assuming that g(t) = By + B,t, the autocorrelation model can be written as: Yim Pyica = Boll ~ P) + By(ty— Pt) + (ui — Pura) . (25) ‘Models of this type are often called first-order autoregressive models, The values ‘u; ~ Puj—y” and thus ‘y; ~ Py;1' are independent, The parameter ‘ is estimated from the data. By setting »* = yj ~ Py;_, By* = Bo(1 — P), i = ty ~ Pli-1, and uj* = uy ~ Puj_1, eq, (26) is written as a simple regres- sion model, namely: Yh = Bok + Bytit + ue. @7) In other words, by transforming the variables, it is possible to estimate the Parameters by OLSR techniques. If P= 1, eq. (26) can be written as a first-difference model, specifically: Yi~ Yen = Bully fina) + (us — wins). (28) This model can be used when a statistical test indicates that P= 1 (Neter and Wasserman, 1974) or when one assumes that the differences in ‘y” are independent as is common in the time-series literature (Nelson, 1973). This model has the advantage that it is relatively simple to use and that com- Puter programs need not be available to estimate ‘P’. If replications exist for disease proportion values at each time period, it is possible to use multivariate growth-curve models. The model can be written exactly as eq. (24); the primary difference is the assumption of in- dependent y's is not made. The parameters are estimated by utilizing the co- variance among disease values in addition to the relationship between ‘y’ and time (Morrison, 1976). Use of this model requires a large number of replications; a further disadvantage is the absence of statistical computer Packages to do the calculations. Autocorrelation, first-difference, and multivariate growth-curve models are all better than OLSR for estimating parameters and predicting future levels of disease (Starkey, 1977), Starkey (1977) conducted an extensive comparison of these models and concluded that first-difference was the best ‘model for the statistical comparison of epidemics. This was due to the sim- 172 plicity of use and superior prediction power of the first-difference model. Nonlinear models ‘The integrated differential equation forms of g(t) were presented in the “biological” model section. Eqs. (4), (7), (10), (13), and (16) are example models. For presentation purposes, assume that the exponential model is ap propriate, The statistical model can be written as: yer yoo + uy. (29) As with the polynomial model, the assumptions of OLSR are not appropri- ate. Considerations discussed in the polynomial section regarding parameter estimation are relevant here. This class of models also presents some new problems. Eq. (29) is nonlinear in the parameters. For our purposes, model is linear in the parameters when no parameter appears as an ex- ponent or is multiplied or divided by another parameter (Neter and ‘Wasserman, 1974). Polynomial equations are linear in the parameters; ‘models in this subsection are nonlinear. Ordinary regression techniques cannot be used to estimate the, parameter of a nonlinear model such as eq. (29). A common and often misused ap- proach is to use a transformation to linearize the model. The exponential model often is presented in the following form: An (yi) = In (¥o) + rej + ui - (30) ‘This model is not equivalent to eq. (29), but is appropriate when the original equation can be written as yim yoo) (31) It seems unlikely that the unexplained variability influences ‘y’ in this expo- nential manner. A further disadvantage of transformed ‘y’ values stems fron the methodology of OLSR. The OLSR procedure derives estimates of paran eters by minimizing the squared differences between observed and predicted values on the lefthand side of the equation. Using eq. (30), differences between ‘In (y)’ and predicted ‘in-(y)’ are minimized, But ‘y’ is the ‘measured quantity in the field; In (y)’ is an artificial representation of dis- ease with different distribution properties. ‘To minimize the squared deviations of the y's, ie., estimate the param- eters in eq. (29), it is necessary to use nonlinear least squares regression. ‘There are three common procedures: (1) Taylor series expansion; (2) Steepest descent; and (3) the Marquardt, compromise (Draper and Smith 1968). All three techniques start with initial estimates of the parameters (supplied by the investigator) and then proceed through an iterative scheme to find the best estimates. Best estimates can loosely be defined as those values that produce the minimum variability in the y's. Nonlinear regression 173 does not always work. Estimation difficulties increase as the number of parameters increase. Success depends on a large sample size. When nonlinear regression does not provide acceptable estimates of the parameters, it may bbe necessary to use transformed equations. A word of caution should be mentioned concerning the practice of giving exact biological (epidemiologic) meaning to the estimated parameters. Precise interpretation of the parameters (¢.g., ‘r’) is strictly valid only when the estimates are uncorrelated in the “neighborhood” of the least squares solution. It was shown that the estimates are often highly correlated (Knoble and Borden, 1972). This means that one parameter may “‘com- Pensate” for the others, thus interfering with the estimation of the true Parameters and making interpretation very risky. The degree of correlation among estimates should always be monitored. “STOCHASTIC” DIFFERENTIAL EQUATIONS Although biological and statistical modeling were discussed separately, it is clear that the two approaches overlap. Differential equations are not use- ful if the parameters cannot be estimated in a ‘statistical setting. The ‘y’ values at any time in statistical models are the result of the integrated rate of increase up to that time. It would be appealing to combine these two model types into a general differential equation with an explicit error term. Indeed, research along these lines has been conducted in the growth curve field since 1974 (Sandland and McGilchrist, 1979). Eq. (2) can be re- formulated as the following “stochastic” differential equation: dy/dt = /(¢; ¥; parameters; u) (82) where ‘u’ is the unexplained variability or error term. Instead of adding the error term after integrating the differential equation (¢.¢., eq. (29)) the un- explained variability is incorporated throughout the integration. This ap- proach makes biological sense because one would expect environmental variation and the other factors that cause variability to directly affect ‘dy/dt” and not just ‘y’. Although this approach is theoretically superior to the separate types of modeling schemes, there are some difficulties. The mathematics for stochast- ic differential equations are exceedingly more difficult than are the other ‘cases; there is no unique solution when integrating models of the general form of eq. (32). Large data sets, i.e. growth measurements at many time Periods, also are needed for estimation purposes. Perhaps in a few years ‘when statisticians develop improved methods for handling this type of model, plant pathologists will make use of stochastic differential equations, DISCUSSION AND CONCLUSIONS Plotting growth measurements vs time usually results in a fairly simple

You might also like