You are on page 1of 23

5.

1
Danya Bynoe

9/8/2019

Before using the grubbs.test, the data was analyzed to ensure that it had a normal
distribution, my using the Shapiro-Wilks test and ggqplot. Both tests indicate that the data
is normally distributed.
library(outliers)
library(dplyr)

library(ggpubr)

library(knitr)

set.seed(77)

#Load the data


crimeData <- read.table("uscrime.txt", header = TRUE)
crime<- crimeData[,"Crime"]

ggqqplot(crime)
#Given tht the majority of the points fall within the area we have sufficent
information to presume that the crime data is normally

shapiro.test(crime)

##
## Shapiro-Wilk normality test
##
## data: crime
## W = 0.91273, p-value = 0.001882

#The p- vlue from the Shapiro-Wilks results is 0.001882 indicates that it is


highly likely that the distrubution is normal

boxplot(crime)

outlierTest<- grubbs.test(crime,11)

#Using the 2- tail method the following outlier were found and it was
determined that there were 2 outliers on the 342 and 1993. The p-value of 1
indicates that at least one of these 2 is not an outlier.

#Using the 1 tail test to determine which if any of the outliers is a true
outlier
outlierTest0<- grubbs.test(crime,10)
outlierTest01<- grubbs.test(crime,10,opposite = TRUE)
#The above outlier tests indicate that the value 1993 is likely to be the
outlier given that the p-value is 0.0789 compared to a p-value of 1 for 342
does not indicate that this is an outlier.

#Identify the outlier row


rowData<- crimeData[which.max(crimeData$Crime),]

#Remove outlier
crimeData1<-crimeData[-as.numeric(row.names(rowData)),]

#Now the
shapiro.test(crimeData1$Crime)

##
## Shapiro-Wilk normality test
##
## data: crimeData1$Crime
## W = 0.93207, p-value = 0.01001

#The p- value from the Shapiro-Wilks results is 0.001882 indicates that it is


highly likely that the distrubution is normal. The W

or<-order(-crimeData$Crime)[1:2]

Data1<-crimeData[which(as.numeric(row.names(crimeData)) %in% or),]


kable(Data1, caption = "Highest crime rates")

Highest crime rates


P
S Po Po o N U Wea In Cri
M o Ed 1 2 LF M.F p W U1 2 lth eq Prob Time me
4 13 0 12 14 14 0.5 99. 1 8. 0.1 3. 673 16 0.015 29.9 196
.6 .1 .9 .1 77 4 5 0 02 9 0 .7 801 012 9
7
2 13 0 12 16 14 0.6 10 3 7. 0.1 4. 674 15 0.041 22.1 199
6 .1 .1 .0 .3 31 7.1 7 02 1 0 .2 698 005 3

Given the information in the table above it seems like the population of the city in row 26
does not seem to support any of the other data in the subsequent columns. The population
of the city is 52 times less than what would be expected. The city recorded in row 4
corresponds to all the other factors except that of the population. I therefore propose that
given the low population the crime rate recorded is an outlier
Question 6.1

CUSUM cm be used in quality analysis of manufactured products for example an eye cream
with retinol A active ingredient with a concentration of 0.05 . By monitoring the levels or
retinol in each batch it is possible to determine where the there are increases or decreases
in the level of retinol in the eye cream. Given the concentration on the active ingredient
and how small deviations from either end of the spectrum would affect the effectiveness of
the product I would recommend that we start with small C value of .005 with a threshold
of +/-0.025. That small of C value would not have it veer to far outside of the expected
concentration. 0.015 threshold would only allow for very small deviations from the
expected value.

6.2

I used the whole month of July to calculate the μ of each year. A rounded version of the SD
for each year was chosen s as the C value. C values were chosen individual to best account
for temperature fluctuations within each year. Two thresholds were analyzed 25 and 30 to
determine when summer officially ends. Given the dates on threshold and the frequency
and closeness in which they occur 30 is the best threshold number and the graphs below
show that there are less false starts to fall or the cooling period. The range of dates that
given that threshold indicate that summer end unofficially on or about the Sep 14 th vs Sept
11 calculation for a threshold of 25.

Using the formula . Since the cooler temperatures were what we needed to observe vlues
greater than mu were set to 0. Temp is the temperature on that day

Sn=max ⁡(0 , Sn−1+max(0 , μ−temp)−C)

2.According to the data received it does not seem to be that Summers re getting hotter.
The table below shows the μ and SD of Jul 1-31 of each year.

Year μ SD
1996 91.19354839 4.901832003
1997 87.25806452 4.426945839
1998 89.70967742 3.046238647
1999 87.64516129 5.782435399
2000 91.74193548 5.507375309
2001 86.74193548 2.607268583
2002 89.25806452 3.741369998
2003 85.58064516 3.490694
2004 87.83870968 3.099427627
2005 86.93548387 4.404054828
2006 90.19354839 4.238076253
2007 86.41935484 3.354342395
2008 89.16129032 2.745866884
2009 86.64516129 3.692771201
2010 91.25806452 4.057649089
2011 91.93548387 3.473091647
2012 94.09677419 4.585156
2013 84.70967742 3.874648787
2014 86.61290323 3.584014689
2015 90.06451613 3.558421784
Year T =25 T = 30
1996 14-Sep 14-Sep
1997 25-Sep 25-Sep
1998 11-Sep 18-Sep
1999 22-Sep 22-Sep
2000 6-Sep 6-Sep
2001 3-Sep 17-Sep
2002 22-Sep 22-Sep
2003 22-Sep 24-Sep
2004 12-Aug 3-Sep
2005 8-Oct 8-Oct
2006 12-Sep 13-Sep
2007 3-Oct 4-Oct
2008 23-Aug 25-Aug
2009 2-Sep 11-Sep
2010 28-Sep 29-Sep
2011 6-Sep 6-Sep
2012 19-Aug 20Aug
2013 17-Aug 17-Aug
2014 26-Sep 27-Sep
2015 11-Sep 12-Sep
1999
450

400

350

300

250

200

150

100

50

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2000
700

600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2001
600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2002
600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2003
400

350

300

250

200

150

100

50

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2004
600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2005
300

250

200

150

100

50

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C = 25 C= 30
2006
700

600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C=25 C = 30
2007
300

250

200

150

100

50

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C=25 C = 30
2008
700

600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C=25 C = 30
2009
600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C=25 C = 30
2011
700

600

500

400

300

200

100

0
24-Jun 14-Jul 3-Aug 23-Aug 12-Sep 2-Oct 22-Oct 11-Nov

C=25 C = 30

You might also like