You are on page 1of 16

Special Models in Regression

(1) The Polynomial Regression Model


The polynomial regression model is used to fit a curve rather than to explain the
relationship between the dependent and independent variable(s). The main purpose
of fitting polynomial regression model is to define the nature of the response curve
416 CH A P T E R 8: Multiple Regression
rather than the nature of regression coefficients.

Example:
tend to be curvilinear in that the rate of growth decreases with age and eventually
Biologistsstops
are altogether.
interestedA in the characteristics
polynomial of growth
model is sometimes used curves, that is, finding a
for this purpose.
model forThis
describing how organisms grow with time. Relationships
example concerns the growth of rabbit jawbones. Measurements were made
of this type
end to beoncurvilinear in that the
lengths of jawbones rate ofofgrowth
for rabbits decreases
various ages. The datawith age in
are given and eventually
Table 8.8,
stops altogether. A polynomial
and the plot of the data is model
given in is sometimes
Fig. used
8.3 where the linefor thisestimated
is the purpose.poly-
nomial regression line described below. Two points for much older rabbits are
This shownconcerns
example on the plotthe
butgrowth
not usedof
in rabbit
the regression.
jawbones. Measurements were made
on lengths of jawbones for rabbits of various ages. The data are given in following
table. Table 8.8 Rabbit Jawbone Length
AGE LENGTH AGE LENGTH AGE LENGTH

0.01 15.5 0.41 29.7 2.52 49.0


0.20 26.1 0.83 37.7 2.61 45.9
0.20 26.3 1.09 41.5 2.64 49.8
0.21 26.7 1.17 41.9 2.87 49.4
0.23 27.5 1.39 48.9 3.39 51.4
0.24 27.0 1.53 45.4 3.41 49.7
0.24 27.0 1.74 48.3 3.52 49.8
0.25 26.0 2.01 50.7 3.65 49.9
0.26 28.6 2.12 50.6
0.34 29.8 2.29 49.2
Solution:

The scatter plot of the data is given in the following figure:


60

50
50
45

40
Length
40
35

30
length

30

20
25
20

10
15

0 1 2 3
0 1 2 3 4 5 6
Age
FIGURE 8.3 age

  Polynomial Regression Plot. 1  


From the scatter plot, it is recommended to estimate polynomial regression line.
From previous studies a fourth-degree polynomial model for estimating the
relationship of LENGTH to AGE is used. The polynomial regression model is

𝐿𝐸𝑁𝐺𝑇𝐻   =   𝛽! + 𝛽! AGE + 𝛽! AGE  ! + 𝛽! AGE  ! + 𝛽! AGE  ! + 𝜀

The estimate polynomial regression model is

𝐿𝐸𝑁𝐺𝑇𝐻   =   𝛽! + 𝛽! AGE + 𝛽! AGE  ! + 𝛽! AGE  ! + 𝛽! AGE  !  

Using the following R. code for computing the estimates

==========================================================

library(gadata)
data=read.xls("file destination", sheet=1)
data
with(data, plot(age, length))
model=lm(length~poly(age, degree=4),data=data)
summary(model)
==========================================================
The ANOVA results for tis regression is

Call:
lm(formula = length ~ poly(age, degree = 4), data = data)

Residuals:
Min 1Q Median 3Q Max
-3.4540 -0.8948 0.2523 1.0698 3.0396

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.2607 0.3192 122.980 < 2e-16
poly(age, degree = 4)1 52.1100 1.6893 30.847 < 2e-16
poly(age, degree = 4)2 -23.5047 1.6893 -13.914 1.09e-12
poly(age, degree = 4)3 7.5661 1.6893 4.479 0.000171
poly(age, degree = 4)4 -0.7002 1.6893 -0.415 0.682339
---
Residual standard error: 1.689 on 23 degrees of freedom
Multiple R-squared: 0.9806, Adjusted R-squared: 0.9773
F-statistic: 291.3 on 4 and 23 DF, p-value: < 2.2e-16

  2  
The following R. code for plotting the estimated polynomial curve
==========================================================
with(data,lines(age, predict(model),col="black"))
==========================================================
50
45
40
35
length

30
25
20
15

0 1 2 3

age
 
 
It looks like that the fitted curve is good and fits the data correctly but from the
ANOVA results the forth coefficient is not significant. In this case, we have to
remove the AGE  ! variable from the model and reduce the polynomial equation to
third order. Therefore, The polynomial regression model is

𝐿𝐸𝑁𝐺𝑇𝐻   =   𝛽! + 𝛽! AGE + 𝛽! AGE  ! + 𝛽! AGE  ! + 𝜀

The estimate polynomial regression model is

𝐿𝐸𝑁𝐺𝑇𝐻   =   𝛽! + 𝛽! AGE + 𝛽! AGE  ! + 𝛽! AGE  !

Using the R. code for computing the estimates


===============================================================

model=lm(length~poly(age, degree=3),data=data)
summary(model)
==========================================================
 
we  have    
 
 
 

  3  
Call:  
lm(formula  =  length  ~  poly(age,  degree  =  3),  data  =  data)  
 
Residuals:  
       Min               1Q         Median               3Q               Max    
-­‐3.8051     -­‐0.8179       0.2235       1.0571       2.9557    
 
Coefficients:  
                                                    Estimate     Std.  Error     t  value     Pr(>|t|)          
(Intercept)                               39.2607             0.3137     125.158       <  2e-­‐16    
poly(age,  degree  =  3)1       52.1100             1.6599       31.394       <  2e-­‐16    
poly(age,  degree  =  3)2     -­‐23.5047             1.6599     -­‐14.160     3.78e-­‐13    
poly(age,  degree  =  3)3         7.5661             1.6599         4.558       0.000128    
-­‐-­‐-­‐  
Residual  standard  error:  1.66  on  24  degrees  of  freedom  
Multiple  R-­‐squared:  0.9805,  Adjusted  R-­‐squared:  0.9781    
F-­‐statistic:  402.3  on  3  and  24  DF,  p-­‐value:  <  2.2e-­‐16    
 
 
and the fitted polynomial curve is
50
45
40
35
length

30
25
20
15

0 1 2 3

age
 
================================================================  
with(data,lines(age,  predict(model),col="black"))  

================================================================  

As is seen, the fitted polynomial curve fits the data better and all estimated
coefficients are significant.

 
  4  
(2) The Multiplicative Regression Model

It is used to describe a curved line relationship of the multiplicative model. The


multiplicative model takes the following formula
! ! !
𝑦 = 𝑒 !! 𝑥! ! 𝑥! ! … 𝑥! ! 𝑒 !

where 𝑒 refers to the Naperian constant used as the basis for natural logarithms.

The coefficients, sometimes called “elasticities”, indicate the percent change in


the dependent variable associated with a one-percent change in the independent
variable, holding constant all other variables.

Note that the error term 𝑒 ! is a multiplicative factor. That is, the value of the
deterministic portion is multiplied by the error. The expected value of this error,
when  𝜖 = 0, is one. When the random error is positive the multiplicative factor is
greater than 1; when negative it is less than 1. This type of error is quite logical in
many applications where variation is proportional to the magnitude of the values
of the variable.

The multiplicative model is a nonlinear model and it is difficult to use the least
squares method to estimate its unknown parameters. The only way to estimate the
unknown parameters of the multiplicative model is to transform its equation from
nonlinear to linear equation. By taking the logarithm for both sides of the
multiplicative model, we will have
log  (𝑦) = 𝛽! + 𝛽! log  (𝑥! ) + 𝛽! log  (𝑥! ) + ⋯ 𝛽! log  (𝑥! ) + 𝜖  
This model is easily implemented.

Example:
It is desired to study the size range of squid eaten by sharks and tuna. The beak
(mouth) of squid is indigestible hence it is found in the digestive tracts of
harvested fish; therefore, it may be possible to predict the total squid weight with a
regression that uses various beak dimensions as predictors. The beak
measurements and their computer names are:

RL = rostral length and W = width. The dependent variable WT is the weight of


squid.

Data are obtained on a sample of 22 specimens. The data are given in the
following table

  5  
obs rl w wt
1 1.31 0.35 1.95
2 1.55 0.47 2.9
3 0.99 0.32 0.72
4 0.99 0.27 0.81
5 1.05 0.3 1.09
6 1.09 0.31 1.22
7 1.08 0.31 1.02
8 1.27 0.34 1.93
9 0.99 0.29 0.64
10 1.34 0.37 2.08
11 1.3 0.38 1.98
12 1.33 0.38 1.9
13 1.86 0.65 8.56
14 1.58 0.5 4.49
15 1.97 0.59 8.49
16 1.8 0.59 6.17
17 1.75 0.59 7.54
18 1.72 0.63 6.36
19 1.68 0.68 7.63
20 1.75 0.62 7.78
21 2.19 0.72 10.15
22 1.73 0.55 6.88
Solution:

First, let us fit the multiple linear regression for the data using R. the output is in
the following table

Call:
lm(formula = wt ~ rl + w, data = data)

Residuals:
Min 1Q Median 3Q Max
-1.6391 -0.5087 0.1070 0.5484 0.9674

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.8349 0.7648 -8.937 3.11e-08 ***
rl 3.2747 1.4161 2.313 0.032117 *
w 13.4008 3.3800 3.965 0.000831 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6952 on 19 degrees of freedom


Multiple R-squared: 0.9575, Adjusted R-squared: 0.953
F-statistic: 213.9 on 2 and 19 DF, p-value: 9.382e-14
  6  
Residuals vs Fitted Normal Q-Q

2
1.0
15 15
0.5

1
Standardized residuals
0.0
Residuals

0
-0.5

-1
18
-1.0

18
-1.5

-2
2

0 2 4 6 8 10 -2 -1 0 1 2

Fitted values Theoretical Quantiles

The regression appears to fit well and both coefficients are significant, although
the p value for RL is only 0.032. However, the residual plot reveals some
problems:

• The residuals have a curved pattern: positive at the extremes and negative
in the center. This pattern suggests a curved response. 
• The residuals are less variable with smaller values of the predicted value
and then become increasingly dispersed as values increase. This pattern
reveals a heteroscedasticity problem.

We noted that the logarithmic transformation should be used when the standard
deviation is proportional to the mean. The pattern of residuals for the linear
regression would appear to suggest that the variability is proportional to the size of
the squid. This type of variability is logical for variables related to sizes of
biological specimens, which suggests a multiplicative error. The multiplicative
model itself is appropriate for this example. The following R. code is used for
estimating it

==========================================================

model=lm(log(wt)~log(rl)+log(w),data=data)
summary(model)
plot(model)
==========================================================

  7  
Call:
lm(formula = log(wt) ~ log(rl) + log(w), data = data)

Residuals:
Min 1Q Median 3Q Max
-0.27314 -0.08790 0.02622 0.12677 0.17398

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.1689 0.4783 2.444 0.024454 *
log(rl) 2.2785 0.4933 4.619 0.000187 ***
log(w) 1.1092 0.3736 2.969 0.007886 **
---
Residual standard error: 0.1492 on 19 degrees of freedom
Multiple R-squared: 0.9768, Adjusted R-squared: 0.9744
F-statistic: 400.5 on 2 and 19 DF, p-value: 2.929e-16
 
 

Residuals vs Fitted Normal Q-Q


1.5
0.2

1.0
0.1

0.5
Standardized residuals
0.0

0.0
Residuals

-0.5
-0.1

-1.0
-0.2

-1.5

3
2
21 2
-0.3

-2.0

21

0.0 0.5 1.0 1.5 2.0 2.5 -2 -1 0 1 2

Fitted values Theoretical Quantiles


 
This model fits better and both coefficients are highly significant. The
multiplicative model is

𝑊𝑇 = 𝑒 1.1689 𝑅𝐿 2.2785
𝑊 1.1092
.

Finally the residuals appear to have a uniformly random pattern (homoscedastic).

  8  
(3) Logistic Regression
Logistic regression is designed for the situation where the response variable, 𝑦,
has only two possible outcomes. In this situation, 𝑦 is said a binary variable
because it takes just two values a 0 (for failure) or a 1 (for success). For example,
y might represent whether a student succeeds or does not succeed in passing
college algebra. We focus merely on the probability of a success, 𝑝, since the
probability of a failure is 1 − 𝑝. On other words, the dependent variable y follows
binomial distribution with probability of success 𝑝 with a single trial 𝑛 = 1. In
logistic regression our interest is in whether the probability 𝑝 is influenced by one
or more independent variables 𝑥! , 𝑥! , . . . , 𝑥! . We will denote the value of 𝑝 at
some specific set of values for the independent variables as 𝑝! . Using the
properties of binomial probability distribution, we have that 𝐸 𝑦! = 𝜇!|! = 𝑝!
and 𝑉𝑎𝑟 𝑦! = 𝜎!! = 𝑝! (1 − 𝑝! ).

When we use the ordinary regression with this situation, we face two problems:

1. The fitted values are not within the range of (0, 1) and there is no way to
remain them in that range.
2. If we could find such a way, the distribution of the dependent variable is
not normal.

The first problem is addressed by expressing the relationship between the 𝑝! and
the independent variables as a nonlinear function known as the logistic function.
Estimating parameters using maximum likelihood estimation method rather than
least squares estimation method solves the second problem.

The logistic function is

𝑒 !! !!! !! !!! !! !⋯!! !!


𝑝! =
1 + 𝑒 !! !!! !! !!! !! !⋯!! !!
It can be seen know that 𝑝! must be between 0 and 1 for any choice of the 𝛽 and
any choice of the 𝑥! .

An important quantity in binary regression is the odds, the probability of a success


divided by the probability of a failure,

𝑜𝑑𝑑𝑠 = 𝑝! /(1 − 𝑝! ).
Under the logistic regression model,

𝑜𝑑𝑑𝑠 = 𝑒 !! !!! !! !!! !! !⋯!! !! .


By taking the logarithm for both sides, we have
ln 𝑜𝑑𝑑𝑠 = 𝛽! + 𝛽! 𝑥! + 𝛽! 𝑥! + ⋯ 𝛽! 𝑥! ,

  9  
This is our familiar linear regression model. However the linear influence is on the
ln(odds). If 𝛽! is positive, then for each unit increase in 𝑥! we expect an increase
of 𝛽! in ln(odds) assuming of course that all other 𝑥! values can be held constant.
In turn, this means that the probability of success must be increasing as 𝑥!
increases. However, the increase is nonlinear. Once 𝑝! becomes large, further
increases in 𝑥! can only cause slight increases in 𝑝! .

We compare the odds of success for two individuals with different values of the
670 CH A P T E R 13: Special Types of Regression
independent variables using the odds ratio. If individual 1 has values
𝑥!! , 𝑥!" , … 𝑥!! and individual 2 has values 𝑥!! , 𝑥!! , … 𝑥!! then their odds ratio is

𝑜𝑑𝑑𝑠  𝑟𝑎𝑡𝑖𝑜 = 𝑒 !! !0.9


!! !!!" !!! !!" !!!! !⋯!! !!! !!!! .
0.8
If these two individuals differ by one unit in their value of 𝑥! , but all other
0.7
!!
independent variables are equal, then0.6 their odds ratio is 𝑒 . Note that if 𝛽! is zero,
Probabilities

then the odds ratio is 1, meaning the0.5two individuals have the same odds and
hence the same probability of success.
0.4
0.3

Logistic regression can use the same0.2mix of dummy and interval independent
0.1
variables as ordinary regression. The ln(odds) is sometimes called the logit
0.0
function. Since
FIGURE the13.2link between the expected
0 4
value8 of 𝑦!12and the
16
linear
20
expression
in terms of the independent variable comes through the logits, we refer to the
Observed and Fitted Concentration

logits as theProbabilities for Example 13.1.


link function.

Example: ■ Example 13.2


In a study of In a study of urban planning in Florida (Mattson et al., 1991), a survey was taken
urban planning in Florida (Mattson et al., 1991), a survey was taken
of 50 cities; 24 used tax increment funding (TIF) and 26 did not. One part of the
of 50 cities; 24 used
study was tax increment
to investigate funding between
the relationship (TIF) and 26 didornot.
the presence One
absence part of the
of TIF
study was to investigate
and the medianthe relationship
family income of thebetween
city (x). Thethe
datapresence
are given inor absence
Table 13.3, of TIF
with median income in $1000s.
and the median family income of the city (x). The data are given in the following
table, with median income in $1000s.
Table 13.3 Data from Urban Planning Study
TIF x TIF x TIF x TIF x

0 9.2 0 10.5 1 9.6 1 12.5


0 9.2 0 10.5 1 10.1 1 12.6
0 9.3 0 10.9 1 10.3 1 12.6
0 9.4 0 11.0 1 10.9 1 12.6
0 9.5 0 11.2 1 10.9 1 12.9
0 9.5 0 11.2 1 11.1 1 12.9
0 9.5 0 11.5 1 11.1 1 12.9
0 9.6 0 11.7 1 11.1 1 12.9
0 9.7 0 11.8 1 11.5 1 13.1
0 9.7 0 12.1 1 11.8 1 13.2
0 9.8 0 12.3 1 11.9 1 13.5
0 9.8 0 12.5 1 12.1
0 9.9 0 12.9 1 12.2

  10  
Solution

The logistic model chosen to describe this data is

ln 𝑜𝑑𝑑𝑠 = 𝛽! + 𝛽! 𝑥! ,

where 𝑥 is a city’s median family income and ODDS is the probability a city will
have TIF divided by the probability it will not have TIF. By using the following
R.code

==========================================================
data=read.xls("File destination", sheet=1)
data
model=glm(tif ~ income, data = data, family = "binomial")
summary(model)
=========================================================
we have the following results

Call:
glm(formula = tif ~ income, family = "binomial", data = data)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.8781 -0.8021 -0.4736 0.8097 1.9461

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.3487 3.3511 -3.387 0.000708
income 1.0019 0.2954 3.392 0.000695
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 69.235 on 49 degrees of freedom


Residual deviance: 53.666 on 48 degrees of freedom
AIC: 57.666
The results show that we have strong evidence that of a relationship between the
use of TIF and median income.
To test the whole relationship, we can use Wald test. Wald test is used for testing the null
hypothesis 𝐻! : 𝛽! = 𝛽! = ⋯ = 𝛽! = 0. The test statistic of Wald, 𝑋 ! , test follows Chi-
squared distribution with (𝑝) degrees of freedom, where 𝑝 is number of predictors. The
following R.code is used for performing Wald test. Note that you need to install “aod”
library.

  11  
===============================================================

library(aod)
wald.test(b = coef(model), Sigma = vcov(model), Terms = 2)
==========================================================
Note that Terms argument in wald.test function can be a single number for
determining a single coefficient or a vector for determining several coefficients.
For example, if you want to test 𝐻! : 𝛽! = 𝛽! = 𝛽! = 0, Terms=c(2:4).
The following are the results of Wald test

Wald test:
----------
Chi-squared test:
X2 = 11.5, df = 1, P(> X2) = 0.00069
The Wald test shows that the p-value is too small. So we reject the null hypothesis
and conclude that the 𝛽! differs from zero.

For estimating the odds ratio and the 95% confidence interval, use the following
code
=========================================================
exp(cbind(Point_Estimate = coef(model), confint(model, level = 0.95)))
=========================================================
The results are

Point_Estimate 2.5 % 97.5 %


(Intercept) 1.178525e-05 7.470377e-09 0.004764204
income 2.723401e+00 1.602906e+00 5.204794713

The results show that the estimate odds ratio is 2.7234 and we can conclude that
more wealthy cities have a higher probability of adopting TIF. For every
additional $1000 in median income, the odds of adopting TIF are multiplied by a
factor between 1.603 and 5.205.

  12  
(4) Poisson Regression
The Poisson distribution is widely used as a model for count data. It is frequently
appropriate when the counts are of events in specific regions of time or space.
Dependent variables that might be modeled using the Poisson regression would
include the

• number of fatal auto accidents during a year at intersections as a function of


lane width, 
H A P T E R 13: Special Types of
• number ofservice
Regression
interruptions during a month for a network server as a
function of usage, and 
• number of fire ant colonies in an acre of land as a function of tree density. 

AtThere is no fixed
first glance, the upper
term ln(s limiti ) on theseem
may possible
likenumber of events.
just another Recalling variable
independent the in
properties of the Poisson, there is a single parameter, 𝜇, which
the Poisson regression. However, its coefficient is identically 1, so that no parameteris the expected
number
need of events. for
be estimated It isit.essential
This is that
called 𝜇 beanpositive, and the regression
offset variable, functionregres-
and all Poisson
must
sion enforcewill
software this.allow
Poisson youregression
to indicate assumes size 𝑦
such aeach ! follows
marker. a Poisson size is only
Sometimes
distribution
specified with
up to mean 𝜇! , of
a constant where 
proportionality. That is, we might not know exactly
the size of units i and i , but we know that unit i is twice the size of unit i′ . This

𝑦! = ln 𝜇! = 𝛽! + 𝛽! 𝑥! + 𝛽! 𝑥! + ⋯ 𝛽! 𝑥! ,
suffices, as the unknown proportionality constant will become an additive constant
once logarithms are computed, and be combined with the intercept β0 .
The previous linear expression may take on either positive or negative values, but

𝜇! = 𝑒 !! !!! !! !!! !! !⋯!! !!


■ Example 13.4
will always
Bailer et al.be(1997)
positive. Note thatanthearticle
published link function
showingishow the logarithmic function. could
Poisson regression The
proper method of fitting
be an important tool inthis model
safety is via Table
research. Maximum13.8 Likelihood
shows theirestimation.
counts of fatalities
in the agriculture, forestry, and fishing industries and estimates of the number of
Example:
workers in those industries. Figure 13.4 graphs the rates per 1000 workers (number
of fatalities
Bailer × 1000/number
et al. (1997) published of
anworkers). We would
article showing how like to seeregression
Poisson that fatality rates
could beare
declining,
an importantbut toolis in
there anyresearch.
safety evidenceThe thatfollowing
this is so?table shows their counts of
fatalities in the agriculture, forestry, and fishing industries and estimates of the
number of workers in those industries.
Table 13.8 Fatalities and Number of Workers
Year Fatalities Workers Year Fatalities Workers

1983 511 2850803 1988 506 2649044


1984 530 2767829 1989 491 2665645
1985 566 2667323 1990 464 2614612
1986 499 2679587 1991 484 2666477
1987 529 2709966 1992 468 2581603

The following Figure graphs the rates per 1000 workers (number of fatalities ×
1000/number of workers).
  13  
0.23
workers

0.21
0.210
0.205
0.200
0.195
rate

0.190
0.185
0.180

1984 1986 1988 1990 1992

year

We would like to see that fatality rates are declining, but is there any evidence that
this is so?

Solution:

We will model the number of fatalities each year as a Poisson variable with mean
𝜇! = 𝜆! 𝑠!  where 𝜆! is the rate of fatalities per worker in year 𝑖, and 𝑠! is the number
of workers in these industries during year 𝑖. To model a trend in time, we use

ln 𝜇! = 𝛽! + 𝛽! 𝑖

where 𝑖   = 𝑦𝑒𝑎𝑟 − 1982. The link function is the logarithmic function. By using
the following R. code

===============================================================

data=read.tabe("File Destination", header=T)


data
data$period<-data$year-1982
model=glm(fatal ~ 1+period, data = data, family= poisson (link = log))
summary(model)
==========================================================

  14  
Call:
glm(formula = fatal ~ 1 + period, family = poisson(link = log),
data = data)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.26087 -0.59709 -0.08476 0.26053 1.81730

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.306839 0.029875 211.107 < 2e-16
period -0.015205 0.004903 -3.101 0.00193

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 17.0524 on 9 degrees of freedom


Residual deviance: 7.4292 on 8 degrees of freedom
AIC: 92.036

From the results, we can conclude that the Poisson regression fit the data properly,
since the p-value of the period coefficient is significant. To test the entire
relationship, one can perform the Wald test as follows
==========================================================
Library(oad)
wald.test(b=coef(model),Sigma=vcov(model),Terms=2)
==========================================================
and the results are

Wald test:
----------
Chi-squared test:
X2 = 9.6, df = 1, P(> X2) = 0.0019

As is seen, the p-value of Wald statistics is significant. Thus, we can conclude that
the Poisson regression fits the data well.
We can see this fit as follows
==========================================================
data$pred<- exp(model$linear.predictors)
with(data,plot(period,fatal,type="p"))
with(data,lines(period,pred,lty=1))
==========================================================

  15  
 
fatal

460 480 500 520 540 560

2
4

period
6
8
10

16  

You might also like