Professional Documents
Culture Documents
Sravanthi.M
1
Table of Contents
1. Project Objective...............................................................................................................................3
2. Assumptions......................................................................................................................................3
3. Exploratory Data Analysis – Step by step approach...........................................................................3
3.1. Environment Set up and Data Import........................................................................................3
3.1.1.Install necessary Packages and Invoke Libraries.................................................................3
3.1.2.Set up working Directory....................................................................................................3
3.1.3.Import and Read the Dataset.............................................................................................4
3.2. Variable Identification................................................................................................................4
4. Conclusion.........................................................................................................................................4
5. Detailed Explanation of Findings…………………………………………………………………………………………………….5
5.1Q Formulate and present the rationale for a hypothesis test that Par could use to compare
the driving distances of the current and new golf balls.
5.2Q Analyze the data to provide the hypothesis testing conclusion. What is the p-value for your
test? What is your recommendation for Par Inc.?
5.3Q Provide descriptive statistical summaries of the data for each model.
5.4Q What is the 95% confidence interval for the population mean of each model, and what is
the 95% confidence interval for the difference between the means of the two population?
5.5Q Do you see a need for larger sample sizes and more testing with the golf balls? Discuss.
2 Assumptions
The Independent Samples t-Test compares the means of two independent groups in
order to determine whether there is statistical evidence that the associated population
means are significantly different.
The Independent Samples t -Test is a parametric test
3|Page
3.1.3 Import and Read the Dataset
The given dataset is in .xsl format. Hence, the command ‘read_excel’ is used for importing the
file.
dim: returns the dimension (e.g. the number of columns and rows)
various model fitting functions. The function invokes particular methods which
4 Conclusion
With given data set you can go ahead and launch the new ball (cut resistant). To draw a final
conclusion, we need more larger sample sizes and we need to test in different whether
conditions and different places. So, that we can draw clear conclusion with more sample sizes.
Below is the expanded brief for given observations
4|Page
5 Detailed Explanation of Findings
5.1Q Formulate and present the rationale for a hypothesis test that Par could use to compare the driving
distances of the current and new golf balls.
Ans: Formulating and presenting the rationale for a hypothesis test that Par, Inc. to compare the driving
distance of the current and new golf balls. The result of test on the durability of the improved product
another issue has been raised and this is the effect of the new coating on driving distances. 40 balls of
both the new and current models were subjected to distance test. They are independent sample and test
follows a large sample case. The Null hypothesis and alternative hypothesis are formulated as follow:
Now, we need to perform two tailed t-test by using below syntax and code has been attached in 5.6
t.test (Golf$Current, Golf$New, paired = FALSE, conf. level = 0.95, alternative = "t")
Observations Values
Degrees of Freedom (df) 76.852
t- value 1.32
p-value 0.188
95% Lower Confidence Interval -1.384
95% Upper Confidence Interval 6.934
Results: Hence Null Hypothesis is rejected because Means of Current balls and New balls are not equal.
We accept Alternative Hypothesis.
5.2Q Analyze the data to provide the hypothesis testing conclusion. What is the p-value for your test?
What is your recommendation for Par Inc.?
Ans: From the above hypothesis testing conclusion is as follows:
We will be do Two tailed “t test” and the p-value = 0.188
Recommendation: With given sample data of Par, Inc. We have observed no much difference in
Mean so, we can go ahead and launch New Golf ball (Cut Resistant).
5.3Q Provide descriptive statistical summaries of the data for each model.
5|Page
Ans: For descriptive statistical summaries we will use “Histogram” & “Boxplots”. For Golf data set
Histogram & Boxplot will be plotted with below syntax
Histogram for Current Ball:
Syntax: hist (Golf$Current, main = "Current Balls", xlab = "Driving distance", border
="pink", col = "Blue")
7|Page
5.4Q What is the 95% confidence interval for the population mean of each model, and what is the 95%
confidence interval for the difference between the means of the two population?
Ans: If we clearly analyze the problem, we can find 2 parts of the problem. 1 st part is we need to find
95% confidence interval for the population mean of each Current and New models. We can
calculate with the given below formula
α σ
x́ ± Z
2 √n
Note: The values stated in the table are derived from above mentioned formula by using
above mentioned syntax’s and code has been attached below 5.6
8|Page
n=Sample ¿¿
95% Confidence Interval for Current
Ball
Confidence Interval 95%
Mean 267.5
Standard Deviation 9.89
n 40
Z 1.960
Upper Confidence 270.56
Limit
Lower Confidence 264.43
Limit
Note: The values stated in the table are derived from above mentioned formula by using
above mentioned syntax’s and code has been attached below 5.6
2nd part of the problem is we need to find 95% confidence interval for difference between the
means of two populations. We can calculate with the given below formula or we can find by
hypothesis t-test
s 12 s 22
x 1−x 2 ±t 1−¿α2 , v
√ +
n 1 n2
¿
Note: The values stated in the table are derived using syntax’s and code which has been
attached below 5.6
p-value 0.188
95% Lower Confidence Interval -1.384
95% Upper Confidence Interval 6.934
9|Page
Result: The 95% Confidence Interval for difference between the means of the two Populations are
(-1.384, 6.934)
5.5Q Do you see a need for larger sample sizes and more testing with the golf balls? Discuss.
Ans: Yes, by seeing statistical summaries of the given data we require large sample sizes. So, that we
can test in different circumstances and different weather conditions for better Assumptions. And
when compared to large sample size there may vary in cost and driving distance of the ball. Hence,
we recommend to test with larger sample size.
5.6 Source Code
getwd()
install.packages("readr")
library(readr)
install.packages("readxl")
library(readxl)
##import data
str(Golf)
head(Golf,10)
tail(Golf,10)
summary(Golf)
## Apply t test
10 | P a g
e
install.packages("ggplot2")
library(ggplot2)
## 95% confidence interval for the population mean of each model Current and
New
#Current
x1bar = mean(Golf$Current)
s1 = sd(Golf$Current)
n = 40
z = 1.960
ULC = xbar+z*s/sqrt(n)
LLC = xbar-z*s/sqrt(n)
#New
x2bar = mean(Golf$New)
S2 = sd(Golf$New)
ULN = Nxbar+z*NS/sqrt(n)
LLN = Nxbar-z*NS/sqrt(n)
11 | P a g
e
12 | P a g
e