You are on page 1of 38

Project 1 – COLD STORAGE CASE STUDY

REPORT ON THE ANALYSIS OF THE DATAST

BY:
PRANAV VISWANATHAN

1
Table of Contents
CONTENT PAGE
1 Project Objective 3
2 Assumptions 3
2.1 Assumptions from the problem’s point of view 4
3 Exploratory Data Analysis – Step by step approach 4
3.1 Environment Set up and Data Import 5
3.1.1 Install necessary Packages and Invoke Libraries 6
3.1.2 Set up working Directory 6
3.1.3 Import and Read the Dataset 7
3.2 Variable Identification 8
3.2.1 Variable Identification – Inferences 9
3.3 Univariate Analysis 10
3.4 Bi-Variate Analysis 13
3.5 Missing Value Identification 14
3.6 Outlier Identification 15
4 Problem and solutions 15
4.1 Problem 1 15
4.2 Problem 2 20
5 Conclusion 28
6 Appendix A – Source Code 29

2
1.PROJECT OBJECTIVE:

The objective of this project is to explore the COLD STORAGE DATASET


namely (Cold_Storage_Temp_Data.csv, Cold_Storage_MAR2018.csv) in R to
generate solutions and get the insights of the dataset provided .The exploration of
dataset is done in steps to get the desired output.

The following are the steps to be followed:

Getting the source i.e. the dataset needed in the desired file format.

(e.g. :- .csv, .excel)

1. Importing the dataset into the R studio


2. Exploring the structure and nuances of the dataset.
3. Graphical exploration to see the comparative analysis of different variables
present in the dataset.
4. Descriptive statistics to get the brief summary of the dataset and its insights
like type, class and to break down the dataset into measures of central
tendency and derive its outcome.
5. Draw insights and get solutions from the analysis done on the dataset.

2.ASSUMPTIONS:

The assumptions taken into account on the given dataset’s are that they are free
from missing values, errors.

Let us assume that the datasets provided for problems is free from errors.

3
2.1 Assumptions from problem point of view :

1. The dataset is correctly imported and checked for errors and missing values.
2. The dataset consists of date, month, season and temperature , so we check
for possible errors in the data type of different parameter’s.
3. Let us assume the temperature of cold storage is maintained properly
between 3-4 deg c.
4. Let us assume that the dataset is normally distributed with mean and
standard deviation.
5. The temperature is read properly at correct intervals.
6. Let the maximum accepted temperature be 3.9 deg c.
7. It is assumed that the in the first year of business they outsourced the plant
maintenance work to a professional company with stiff penalty clauses.
8. If it is proven that the probability of the temperature falls out of 2-5 deg c is
above 2.5 % or less than 5 % penality is 10 % of AMC else if it exceeds
above 5%, penality is 25% of AMC fee.

3. EDA-Expolatory Data Analysis:

In statistics, exploratory data analysis (EDA) is an approach to analyzing data


sets to summarize their main characteristics, often with visual methods.
A statistical model can be used or not, but primarily EDA is for seeing what the
data can tell us beyond the formal modeling or hypothesis testing task. Exploratory
data analysis was promoted by John Tukey to encourage statisticians to explore the
data, and possibly formulate hypotheses that could lead to new data collection and
experiments.

4
EDA is different from initial data analysis (IDA), which focuses more narrowly on
checking assumptions required for model fitting and hypothesis testing, and
handling missing values and making transformations of variables as needed.

The objectives of EDA are to:

 Suggest hypotheses about the causes of observed phenomena


 Assess assumptions on which statistical inference will be based
 Support the selection of appropriate statistical tools and techniques
 Provide a basis for further data collection through surveys or experiments

Exploratory Data Analysis – Step by step approach :


A Typical Data exploration activity consists of the following steps:
1. Environment Set up and Data Import
2. Variable Identification
3. Univariate Analysis
4. Bi-Variate Analysis
5. Missing Value Treatment
6. Outlier Treatment
7. Variable Transformation / Feature Creation
8. Feature Exploration
Steps 5 and 6 are not in the scope of this project.

5
3.1 Environment Set up and Data Import
3.1.1 Install necessary Packages and Invoke Libraries

Here the necessary packages for using various functions are installed and the
respective libraries are invoked for the purpose of analyzing.
install.packages()- Function used for installing packages.
library()-is used to call the libraries from installed packages

3.1.2 Set up working Directory


Before the exploration of the given dataset ,we first set up an environment more
precisely where we want to save and take in the data set from .
This is done with the help of setwd() which is used to set up the working
environment.
getwd() - is a function which helps to get the location which is set.
CODE:
setwd('C:\\Users\\user\\Desktop\\pgp-babi')
getwd()

R STUDIO O/P:

Fig 1: setting directory

Fig 2: output of directory in R console

6
Alternatively,we can use the session->set work directory->select the directory

Fig 3: Alternate method to set directory

3.1.3 Import and Read the Dataset


The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for
importing the file.
Data=read.csv(‘file_name’)
Data
The above function returns the dataset .

R STUDIO O/P:

Fig 4: Reading data into console

7
Fig 5: output of dataset

3.2 Variable Identification


Variables are the factors in an experiment that change or potentially change.
read.csv()- This function is used to read all the data from a .csv file
str()-str() is a compact way to display the structure of an R object. This allows you
to usestr as a diagnostic function and an alternative to summary. str() will output
the information on one line for each basic structure. Str() is best for displaying
contents of lists.
summary()-summary() function is a generic function used to produce
result summaries of the results of various model fitting functions. The function
invokes particular methods which depend on the class of the first argument
mean()-Gives the averages of the field selected.
sd()-Gives the standard deviation of the selected field.
pnorm()-pnorm() calculates cumulative distribution function of normal
distribution, i.e. where μ is mean and σ is standard deviation
filter()-It selects or filters the rows of the data table that meet certain criteria
creating a new data subset.
head(data,n=value)-Gives the first n rows of the data set
tail(data,n=value)-Gives bottom n rows of data set

8
3.2.1 Variable Identification – Inferences
After uploading the file into the R studio,
First step is to see the structure of the dataset.we use str() function to see the same.

Fig 6: str function –it gives the structure of data

Fig 7: output of the structure of data

From the above figure we see that the dataset is of data frame type and Data type
of each field is shown. This helps to identify whether the variables are categorical
or numerical.
We see that there are 35 observations divided among 4 variables.
2. Then the various insights of the dataset is seen that is the statistical parameters
are analyzed.
Here we use summary() to get the statistical parameters.

Fig 8:summary function

Fig 9: output of summary function

From the above figure we get the mean,median ,quartiles of various variables to
analyse the dataset for further maipulation.

9
3.suppose the dataset consists of large no of rows and columns we make use
head() and tail() to get the top and bottom n rows specified o check the consistency
of the dataset.

3.3 Univariate Analysis


Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so
in other words your data has only one variable. It doesn’t deal with causes or
relationships and it’s major purpose is to describe; it takes data, summarizes that
data and finds patterns in the data.
Some ways you can describe patterns found in univariate data include central
tendency (mean, mode and median) and dispersion: range, variance, maximum,
minimum, quartiles (including the interquartile range), and standard deviation.

Numerical variables:
par(mfrow=c(2,2))
hist(Temperature)
boxplot(Temperature,horizontal =TRUE ,main='Boxplot of temperature ')
hist(Date)
boxplot(Date,horizontal =TRUE,main='Boxplot of Date' )

R studio o/p:

Fig 10: Function and code for generating charts

10
Fig 11: output

Analysis:
From the above charts of numerical variables of the given dataset we see that the
histogram of temperature is normally distributed as it inceases ,reaches maximum
and then decreases redembling the bell curve.
From the boxplot of temperature we see that it has an extreme value represented by
the outlier.

Fig 12: command to see the outliers

11
Fig 13: output of outliers present

Categorical variable:
plot(Season)
plot(Month)

R STUDIO O/P:
Bar chart of season:

Fig 14: Bar plot of categorical variable

12
Bar chart of month:

Fig 15: Bar plot of categorical variable

3.4 Bi-Variate Analysis:


Here we find the relationship between two or more variables.
Here I will be using rpivotTable() for easy analysis.

A) Count vs temperature by season

Fig 16: output of count vs temperature by season

The above figure represents the change in temperature for various seasons.we can
see that temperature decreases during winter and rainy season.

13
Next we use ggplot() to understand properly and for easy analysis.
Code:
library(ggplot2)
ggplot(data,aes(x=Temperature,fill=Season))+geom_histogram(col='Black',bins=1
5)
+facet_wrap(~Season)

Fig 17: ggplot output of count vs temperature by season

Analysis:
Fom the above two plots we see that the temperature of cold storage unit reaches
maximum in summer and decreases considerably in winter and rainy season.

3.5 Missing Value Identification


In R the missing values are coded by the symbol NA. To identify missings in your
dataset the function is is.na().

14
R STUDIO O/P:

Fig 18: command to find missing values

Fig 19: output showing there is no missing value.

3.6 Outlier Identification:


when information is not available we call it missing values. In R the missing values
are coded by the symbol NA. To identify missings in your dataset the function
is is.na().

Fig 20: command to see outlier

Fig 21:output of outliers

4.Problem and solution:


4.1)Problem 1:
Cold Storage started its operations in Jan 2016. They are in the business of storing Pasteurized
Fresh Whole or Skimmed Milk, Sweet Cream, Flavored Milk Drinks. To ensure that there is no

15
change of texture, body appearance, separation of fats the optimal temperature to be maintained
is between 2 deg - 4 deg C.
In the first year of business they outsourced the plant maintenance work to a professional
company with stiff penalty clauses. It was agreed that if it was statistically proven that
probability of temperature going outside the 2 degrees - 4 degrees C during the one-year contract
was above 2.5% and less than 5% then the penalty would be 10% of AMC (annual maintenance
case). In case it exceeded 5% then the penalty would be 25% of the AMC fee. The average
temperature data at date level is given in the file “Cold_Storage_Temp_Data.csv”

Q1) Find mean cold storage temperature for Summer, Winter and Rainy
Season :
#mean of temperatur in summer: (library:tidyverse)

d2=data %>% filter(Season=="Summer")

d2

summer_mean=mean(d2$Temperature)

summer_mean
> summer_mean=mean(d2$Temperature)
> summer_mean
[1] 3.153333

R STUDIO O/P:

Fig 22:command to find mean of a particular season (summer)

16
Fig 23:output to find mean of a particular season (summer)

Mean of temperature in winter:


d3=data[1:31,c('Season','Month','Date','Temperature')]

d3

winter_mean=mean(d3$Temperature)

winter_mean
> winter_mean=mean(d3$Temperature)
> winter_mean
[1] 2.703226

R STUDIO O/P:

Fig 24 : mean of winter season

Fig 25: output

17
Mean of temperature in rainy season:
d4=data %>% filter(Season=="Rainy")

d4

rainy_mean=mean(d4$Temperature)

rainy_mean

R STUDIO O/P:

Fig 26: mean in rainy season

Fig 27: output

Q2)Find overall mean for the full year


Overall mean:
overall_mean=mean(data$Temperature)

overall_mean

Fig 28: overall mean

Fig 28: output overall mean

18
Q3)Find Standard Deviation for the full year
Overall standard deviation:
overall_sd=sd(data$Temperature)

overall_sd

Fig 29: standard deviation of dataset

Fig 30: output of standard deviation

Q4) A Assume Normal distribution, what is the probability of temperature


having fallen below 2 deg C assume Normal distribution, what is the
probability of temperature having fallen below 2 deg C
mean=overall_mean

sd=overall_sd

pnorm(q=2,mean,sd,lower.tail = T)
> pnorm(q=2,mean,sd,lower.tail = T)
[1] 0.02918146

R STUDIO O/P:

Fig 31: probability of temperature below 2 deg

19
Fig 32 : output

Q5) Assume Normal distribution, what is the probability of temperature


having gone above 4 deg C
pnorm(q=4,mean,sd,lower.tail=F)

R STUDIO O/P:

Fig 33:probability of temperature above 4 deg c

Fig 34 : output

Q6)What will be the penalty for the AMC Company


the penalty for the AMC Company:
1)for less then 2 deg c =10 % of amc (as probability greater than 2.5%)
2)for greater than 4 deg c=0% of amc(as probability less than 2.5%)
Total penality=10 % of amc

PROBLEM 2:
In Mar 2018, Cold Storage started getting complaints from their Clients that they have been getting
complaints from end consumers of the dairy products going sour and often smelling. On getting
these complaints, the supervisor pulls out data of last 35 days temperatures. As a safety measure,
the Supervisor has been vigilant to maintain the temperature below 3.9 deg C.

20
Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1 do you feel that there
is need for some corrective action in the Cold Storage Plant or is it that the problem is from
procurement side from where Cold Storage is getting the Dairy Products. The data of the last 35
days is in “Cold_Storage_Mar2018.csv”

Q1) State the Hypothesis, do the calculation using z test


Assumptions:
1)As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.
2)Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1
According to above assumptions the hypothesis is:
NULL HO: mu<=3.9 deg c(As a safety measure, the Supervisor has been vigilant to maintain
the temperature below 3.9 deg C. )

ALTERNATE HA: mu> 3.9 deg c(Assume 3.9 deg C as upper acceptable temperature
range )

Based on above hypothesis ,


Z=(xbar-mu)/sigma
Z is found out to be z=0.8641166
Z crictical is found to be z_c=1.281552
Comparing Z and Z_C we see Z_C>Z ,therefore we don’t reject NULL hypothesis.
CODE:

CODE:

setwd('C:\\Users\\user\\Desktop\\pgp-babi')

getwd()

##population data

data=read.csv('cold_storage.csv')

data

##sampled data

21
data2=read.csv('Cold_Storage_Mar2018.csv')

data2

##standard deviation of population

standard_deviation=sd(data$Temperature)

standard_deviation

p=sqrt(35)

sd_n=standard_deviation/p

sd_n

##mean of sample

mean_sample=mean(data2$Temperature)

mean_sample

####PROBLEM:2 Q.1

####ASSUMPTIONS

####1)As a safety measure, the Supervisor has been vigilant to maintain the temperature below 3.9 deg
C.

####2)Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1

##### ACCORDING TO THE ASSUMPTIONS,NULL AND ALTERNATE HYPOTHESIS ARE:

## HO:MU<=3.9 deg c

## HA:MU>3.9 DEG C

alpha=0.1

X_bar=mean_sample #mean of sample

N=35 #sample size

MU=3.9 #as per assumption

SD=sd_n #standard deviation of the population divided by sample

##CALCULATED tstat:

Z=(X_bar-MU)/SD

q=1-alpha ## unrejected region

22
Z_c=qnorm(q)

Z_c

### from the values of Z and Z_c we see that Z<Z_c so we fail to reject HO(null hypothesis)

### Moreover the problem is from the procurement side

### The assumption here is true that is the temperatur is maitained

###P-value method:

alpha=0.1

Z= 0.8641166

p_value=1-pnorm(-abs(Z))

p_value

### from the values of Z and p_value we see that p_value is greater than alpha .

### we dont reject HO

R STUDIO O/P:

Fig 35 a.

23
Fig 35 b.

Fig 35 a ,b :Z- test and its outcome

Fig 36: output of Z-test

Conclsion of Z-test:

 Firstly,on comparing the values of z ,z_c and z , p-value we see that z does
not fall in the critical region and on comparing with p-value we see that p-
value is greater than alpha .so we can conclude that we accept the null
hypothesis, which states that the cold storage temperature is maintained .

24
 Secondly,we can conclude that the problem is from the procurement side.

Q2) State the Hypothesis, do the calculation using t-test


Assumptions:
1)As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.
2)Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1
According to above assumptions the hypothesis is:
NULL HO: mu<=3.9 deg c(As a safety measure, the Supervisor has been vigilant to maintain
the temperature below 3.9 deg C. )

ALTERNATE HA: mu> 3.9 deg c(Assume 3.9 deg C as upper acceptable temperature
range )

Alpha=0.1
Based on above hypothesis ,
tstat=(xbar-mu)/(sigma/sqrt(n))
tsat is found out to be tstat= 2.752359

pvalue=pt(tstat,34) ##for cumulative


pvalue= 0.9952888
pvalue(single tail)= 1-pt(tstat,34)
pvalue= 0.004711198
Here Pvalue is less than alpha, hence the null hypothes is rejected.

CODE:
setwd('C:\\Users\\user\\Desktop\\pgp-babi')
getwd()

data=read.csv('Cold_Storage_Mar2018.csv')

25
data
####ASSUMPTIONS
####1)As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.
####2)Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1

##### ACCORDING TO THE ASSUMPTIONS,NULL AND ALTERNATE


HYPOTHESIS ARE:
## HO:MU<=3.9 deg c
## HA:MU>3.9 DEG C
alpha=0.1
mu=3.9
n=35
xbar=mean(data$Temperature)
s=sd(data$Temperature)
tstat=(xbar-3.9)/(s/sqrt(35))
tstat
pvalue=pt(tstat,34) ##for cumulative
pvalue
p=1-pt(tstat,34)## for single tail
p

R STUDIO O/P:
> xbar=mean(data$Temperature)
> s=sd(data$Temperature)
> tstat=(xbar-3.9)/(s/sqrt(35))
> pvalue=pt(tstat,35)
> pvalue=pt(tstat,34)
> pvalue
[1] 0.9952888
> p=1-pt(tstat,34)

26
p
[1] 0.004711198

P=0.004711198 is less than alpha = 0.1 so we reject the null hypothesis and
accept the alternate hypothesis.

Conclsion of T-test:

 Firstly,on comparing the values of tstat , p we see that p is less than alpha.so
we accept the alternate hypothesis which is the temperature is greater than
3.9 deg c.
 Secondly,we can see that the sample mean is greater than the population
mean.
 Thirdly, we can conclude that the problem is with the cold storage unit.

Q3) Give your inference after doing both the tests


 Supervisor of the cold storage company insists that the cold storage unit
maintains a temperature below 3.9 deg c.
 To substantiate this to the clients he shows that the average temperature
during a period of 35 days is well below the average temperature from
2016-17.
 However, after statin the hypothesis and doing Z and T test ,T test is more
appropriate for this situation ,since we are comparing the short term duration
with that of a period of 2-3 year’s.
 According to T-test,

Fig 37: output of T test

27
 If we compare the parameters of t test to that of the population parameters ,it
is seen that the average temperature of sample is greater than that of the
population.
 Moreover, T-test is preferred over Z-test as the sample is N=35 is
considerably smaller .

5.CONCLUSION:
The dataset is analyzed to find out the exact problem endured of the cold
storage unit faced by the customers over years .It is found out through
hypothesis testing that the problem lies with the cold storage unit, in spite of the
supervisor telling that a optimum temperature is maintained. The data analysis
helps to bring out the insights of the dataset making it possible to conclude the
hypothesis. Moreover it is finally concluded that T-test is more significant in
this case based on comparing the statistical parameter’s of sample and
population dataset.

28
APPENDIX
####Setting up the environment
setwd("C:\\Users\\user\\Desktop\\pgp-babi")
getwd()

####Getting the dataset


data=read.csv('cold_storage.csv')
data

####Attaching dataset to R path


attach(data)
data

####Dimensions od dataset
dim(data)

####Getting top 5 rows


head(data,5)

####Getting Bottom % rows


tail(data,5)

####Getting the structure of the Dataset


str(data)

29
####Getting Summary of dataset
summary(data)

####Checking for mising values and total no. of missing values


is.na(data)
sum(is.na(data))

####Univariate data analysis


###Analysis of numeical variables
par(mfrow=c(2,2))
hist(Temperature)
boxplot(Temperature,horizontal =TRUE ,main='Boxplot of temperature ')
hist(Date)
boxplot(Date,horizontal =TRUE,main='Boxplot of Date' )

####Checking for outlier and printing them


OutVals=boxplot(Temperature,horizontal =TRUE ,main='Boxplot of temperature
')
OutVals

###Analysis of categorical values


plot(Season)
plot(Month)

####Bi-variate analysis
library(ggplot2)

30
ggplot(data,aes(x=Temperature,fill=Season))+geom_histogram(col='Black',bins=1
5)
+facet_wrap(~Season)

####PROBLEM 1
###Q1)Find mean cold storage temperature for Summer, Winter and Rainy Season
:
#mean of temperature in summer:
library(tidyverse)
d2=data %>% filter(Season=="Summer")
d2
summer_mean=mean(d2$Temperature)
summer_mean
#mean of temperature in winter:
d3=data[1:31,c('Season','Month','Date','Temperature')]
d3
winter_mean=mean(d3$Temperature)
winter_mean
#mean of temperature in rainy:
d4=data %>% filter(Season=="Rainy")
d4
rainy_mean=mean(d4$Temperature)
rainy_mean

###Q2)Find overall mean for the full year


overall_mean=mean(data$Temperature)

31
overall_mean

###Q3)Find Standard Deviation for the full year


overall_sd=sd(data$Temperature)
overall_sd

###Q4) A Assume Normal distribution, what is the probability of temperature


having fallen below 2 deg C assume Normal distribution, what is the probability of
temperature having fallen below 2 deg C
mean=overall_mean
sd=overall_sd
pnorm(q=2,mean,sd,lower.tail = T)

###Q5) Assume Normal distribution, what is the probability of temperature having


gone above 4 deg C
pnorm(q=4,mean,sd,lower.tail=F)

##Q6)What will be the penalty for the AMC Company


##the penalty for the AMC Company:
##1)for less then 2 deg c =10 % of amc (as probability greater than 2.5%)
##2)for greater than 4 deg c=0% of amc(as probability less than 2.5%)
##Total penality=10 % of amc

####PROBLEM 2
###Q1)State the Hypothesis, do the calculation using z test

32
#Assumptions:
#1)As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.
#2)Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1
#According to above assumptions the hypothesis is:
#NULL HO: mu<=3.9 deg c(As a safety measure, the Supervisor has been vigilant
to maintain the temperature below 3.9 deg C. )
#ALTERNATE HA: mu> 3.9 deg c(Assume 3.9 deg C as upper acceptable
temperature range )
##population data
data=read.csv('cold_storage.csv')
data
##sampled data
data2=read.csv('Cold_Storage_Mar2018.csv')
data2
##standard deviation of population
standard_deviation=sd(data$Temperature)
standard_deviation
p=sqrt(35)
p
sd_n=standard_deviation/p
sd_n
##mean of sample
mean_sample=mean(data2$Temperature)
mean_sample
alpha=0.1

33
X_bar=mean_sample #mean of sample
N=35 #sample size
MU=3.9 #as per assumption
SD=sd_n #standard deviation of the population divided by sample
##CALCULATED tstat:
Z=(X_bar-MU)/SD
Z
q=1-alpha ## unrejected region
Z_c=qnorm(q)
Z_c
### from the values of Z and Z_c we see that Z<Z_c so we fail to reject HO(null
hypothesis)
### Moreover the problem is from the procurement side
### The assumption here is true that is the temperatur is maitained

###P-value method:
alpha=0.1
Z= 0.8641166
p_value=1-pnorm(-abs(Z))
p_value
### from the values of Z and p_value we see that p_value is greater than alpha .
### we dont reject HO
### we accept HO

###Q2)State the Hypothesis, do the calculation using t-test


##Assumptions:

34
#1)As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.
#2)Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1
#According to above assumptions the hypothesis is:
#NULL HO: mu<=3.9 deg c(As a safety measure, the Supervisor has been vigilant
to maintain the temperature below 3.9 deg C. )
#ALTERNATE HA: mu> 3.9 deg c(Assume 3.9 deg C as upper acceptable
temperature range )
Alpha=0.1
mu=3.9
n=35
xbar=mean(data$Temperature)
s=sd(data$Temperature)
tstat=(xbar-3.9)/(s/sqrt(35))
tstat
pvalue=pt(tstat,34) ##for cumulative
pvalue
p=1-pt(tstat,34)## for single tail
p

35
OUTPUT:
Q1)

Q2)

Q3)

Q4)

36
Q4)

PROBLEM 2
Q1)

37
Q2)
> xbar=mean(data$Temperature)
> s=sd(data$Temperature)
> tstat=(xbar-3.9)/(s/sqrt(35))
> pvalue=pt(tstat,35)
> pvalue=pt(tstat,34)
> pvalue
[1] 0.9952888
> p=1-pt(tstat,34)
P

[1] 0.004711198

38

You might also like