You are on page 1of 3

Term 2, 2021/2022

DSA211 Statistical Learning with R


Classwork 3 Answers

R Codes of Q1
#Q1 Poisson
library(fitdistrplus)
health <- read.csv("Health2022C.csv", stringsAsFactors = TRUE)
fpois <- fitdist(health$claims, "pois")
result1 <- gofstat(fpois, chisqbreaks = (0:2),
discrete=TRUE, fitnames=c("Poisson"))
summary(fpois)
result1
ansd <- dpois(1, fpois$estimate)+dpois(2, fpois$estimate)
ansd

R Output of Q1
> #Q1 Poisson
> library(fitdistrplus)
> health <- read.csv("Health2022C.csv", stringsAsFactors = TRUE)
> fpois <- fitdist(health$claims, "pois")
> result1 <- gofstat(fpois, chisqbreaks = (0:2),
+ discrete=TRUE, fitnames=c("Poisson"))
> summary(fpois)
Fitting of the distribution ' pois ' by maximum likelihood
Parameters :
estimate Std. Error
lambda 0.3158 0.007947247
Loglikelihood: -3560.259 AIC: 7122.517 BIC: 7129.034
> result1
Chi-squared statistic: 0.8184031
Degree of freedom of the Chi-squared distribution: 2
Chi-squared p-value: 0.6641803
Chi-squared table:
obscounts theocounts
<= 0 3638.00000 3646.02638
<= 1 1169.00000 1151.41514
<= 2 172.00000 181.80845
> 2 21.00000 20.75004

Goodness-of-fit criteria
Poisson
Akaike's Information Criterion 7122.517
Bayesian Information Criterion 7129.034
> ansd <- dpois(1, fpois$estimate)+dpois(2, fpois$estimate)
> ansd
[1] 0.2666447

1
Term 2, 2021/2022

Answer of Q1
a. P-value=0.66418> 0.05. We cannot reject the null hypothesis. The data
do not provide sufficient evidence that the Poisson distribution is not the true
model.
b. P(X=1)+P(X=2)=0.2666

R Codes of Q2
#Q2 modeling continues distribution
bulbs <- read.csv("Bulb2022C.csv", stringsAsFactors = TRUE)
dat <- bulbs$lifetime
fnorm <- fitdist(dat, "norm")
flnorm <- fitdist(dat, "lnorm")
fexp <- fitdist(dat, "exp")
summary(fnorm)
summary(flnorm)
summary(fexp)
# normal is the best model
plot(fnorm)
fnorm$estimate

R Output of Q2
> #Q2 modeling continues distribution
> bulbs <- read.csv("Bulb2022C.csv", stringsAsFactors = TRUE)
> dat <- bulbs$lifetime
>
> fnorm <- fitdist(dat, "norm")
> flnorm <- fitdist(dat, "lnorm")
> fexp <- fitdist(dat, "exp")
> summary(fnorm)
Fitting of the distribution ' norm ' by maximum likelihood
Parameters :
estimate Std. Error
mean 243.54611 1.3207689
sd 17.71997 0.9339246
Loglikelihood: -772.8536 AIC: 1549.707 BIC: 1556.093

2
Term 2, 2021/2022

Correlation matrix:
mean sd
mean 1 0
sd 0 1

> summary(flnorm)
Fitting of the distribution ' lnorm ' by maximum likelihood
Parameters :
estimate Std. Error
meanlog 5.49264758 0.005444284
sdlog 0.07304274 0.003846444
Loglikelihood: -773.0776 AIC: 1550.155 BIC: 1556.541
Correlation matrix:
meanlog sdlog
meanlog 1.000000e+00 2.975914e-13
sdlog 2.975914e-13 1.000000e+00

> summary(fexp)
Fitting of the distribution ' exp ' by maximum likelihood
Parameters :
estimate Std. Error
rate 0.004105999 0.0002864443
Loglikelihood: -1169.155 AIC: 2340.31 BIC: 2343.503
> # normal is the best model
> plot(fnorm)
> fnorm$estimate
mean sd
243.54611 17.71997

Answer of Q2
(a) Normal (AIC=1549.707, BIC=1556.093), lognorm (AIC=1550.155,
BIC=1556.541), and exponential (AIC=2340.31, BIC=2343.503). Since
normal model provides the smallest AIC and smallest BIC, normal model
is the best fitted distribution.

(b)

(c) Estimated mean is 243.55 days and standard deviation is 17.72


days

You might also like