You are on page 1of 13

BUSINESS STATISTICS REPORT

ON
ANALYZING DIFFERNCES IN GOLF DATA

SUBMITTED TO – Dr. Vipulesh shardeo

SUBMITTED BY -
GOVIND LAKHOTIA (045014)
HRITIKA GUPTA (045020)
NANDINI ARYA (045034)
SAKSHI MITTAL (045049)
SIDDHARTH JAIN (045060)

1
TABLE OF CONTENT

Sno. Content Pg. No


1. INTRODUCTION 3
2. SAMPLING POOL 4
3. BASIC SUMMARY STATISTICS 5
4. ANALYZING DIFFERENCE IN TWO MEANS 8
5. FINDING AND INFERENCE 10
6. CONCLUSION 11
7. REFERNCES 12

2
INTRODUCTION

Analyzing differences between two population means is a fundamental concept in statistics and
data analysis. This topic is essential for making informed decisions, drawing conclusions, and
understanding variations in various aspects of research and real-world scenarios. Whether you
are comparing the average test scores of two groups, the production rates of two factories, or the
effectiveness of two medical treatments, the analysis of population means helps you determine if
there are significant differences between these two groups. In statistical terms, the process of
comparing two population means typically involves hypothesis testing and confidence intervals.
Hypothesis Testing: Hypothesis testing is a statistical method used to make inferences about a
population based on sample data. It is commonly used to determine if there is a significant
difference, relationship, or effect in a study. When comparing two population means, hypothesis
testing helps researchers answer questions such as "Is there a significant difference between the
average heights of two groups of people?" or "Does a new drug lead to a statistically significant
improvement in patient outcomes compared to an existing drug?"
The basic steps in hypothesis testing include:
Formulating Hypotheses: In the context of comparing two population means, researchers
formulate two hypotheses:
 Null Hypothesis (H0): This hypothesis represents the status quo or the default
assumption. It assumes that there is no significant difference between the two population
means. Any observed difference between the sample means is considered to be due to
random chance or sampling variability.
 Alternative Hypothesis (Ha): This hypothesis represents the effect or difference that
researchers are trying to detect. It suggests that there is a significant difference between
the two population means. In other words, it posits that the sample means are not equal
due to a genuine underlying difference.

About Dataset of Golf:

Par Inc. is a major manufacturer of golf equipment. Management believes that Par’s market share
could be increased with the introduction of a cut-resistant, longer-lasting golf ball. Therefore, the
research group at Par has been investigating a new golf ball coating designed to resist cuts and
provide a more durable ball. The tests with the coating have been promising. One of the
researchers voiced concern about the effect of the new coating on driving distances. Par would
like the new cut-resistant ball to offer driving distances comparable to those of the current model
golf ball. To compare the driving distances for the two balls, 40 balls of both the new and current
models were subjected to distance tests. The testing was performed with a mechanical hitting

3
machine so that any difference between the mean distances for the two models could be
attributed to a difference in the design. The results of the tests, with distances measured to the
nearest yard, are contained in the data set “Golf”
SAMPLING TOOLS

We have collected data from Kaggle, GOLF data, which consist of :


Current (Driving distance of golf balls without coating)
New (Driving distance of golf balls with coating)

The sample size of both the samples in the data set is 40 (> 30) from each model of golf ball.
Central Limit Theorem states that irrespective of the shape of the original population, the
sampling distribution of the mean will approach a normal distribution as the size of the sample
increases and becomes large (>30). We also assume that the sample estimate will be reflective of
reality and also the sample size is sufficient for analysis.
We have chosen the level of significance (Alpha) = 0.05 in hypothetical testing because it
provides a standard and widely accepted threshold for statistical significance. This level strikes a
balance between being cautious about making false-positive claims (Type I errors) while still
having the statistical power to detect real effects. Researchers commonly use alpha 0.05 for
consistency, comparability across studies, and practicality in statistical analysis. While
researchers can adjust alpha based on specific research goals, 0.05 serves as a reasonable default
choice in many scientific investigations.

Analyzing Null and Alternate Hypothesis:

We aim to test these hypotheses rigorously using statistical methods, with the null hypothesis
representing the status quo, and the alternative hypothesis representing the possibility of a
meaningful difference in driving distances between the two groups of golf balls.

In the context of the dataset comparing the driving distances of golf balls with and without
coating, the null hypothesis (H0) posits that there is no significant difference in driving distances
between the two types of golf balls. It assumes that any observed variation in the sample means,
such as the difference between 270.275 yards for the uncoated balls and 267.475 yards for the
coated balls, can be attributed to random chance or sampling variability.
On the other hand, the alternative hypothesis (Ha) suggests that there is indeed a significant
difference in driving distances between the two types of golf balls. It implies that the observed
difference in sample means is not merely due to chance but rather reflects a genuine underlying
distinction. This distinction might be caused by the coating on the golf balls, which potentially
affects their performance.

4
BASIC SUMMARY STATISTICS

 The range of Current ball distance test is: 255 - 289


 The range of new ball distance test is: 250 – 289
 There are no. Outliers for current data as all values from minimum to maximum lie in
range 244.65 to 294.

5
6
 Both the samples seem to be normally distributed.
 Mean and Median Values have slight differences.

The new coated design data looks more normally distributed and left skewed, whereas the old
design data looks right skewed. A general scatter plot shows very weak correlation between the
values which shows that each value/data is unbiased with different values from test results.

Degree of Freedom: Since the sample is the same for both Sampling tests, we have (N-1) *2
degrees of freedom: 78

7
ANALYZING DIFFERENCE IN TWO MEAN

Based on the Golf data, we can assume the following:

 Two populations
 No other influence factors considered
 Independently chosen

Based on data description, it appears that we have two groups of measurements: one for the
driving distances of golf balls without coating (Current) and another for the driving distances of
golf balls with coating (New). From this description, it seems that data points are independent.

In an independent sample or unpaired design like this one, each observation in one group (e.g.,
driving distances without coating) is not directly related or paired with a specific observation in
the other group (e.g., driving distances with coating). Golf balls in one group are not matched or
connected to specific golf balls in the other group. Instead, you have two separate sets of
measurements from different conditions.
Since there are two independent sample cases. The two-tailed test will be applicable for the
project.

NULL HYPOTHESIS: No difference between the mean distances for the two models due to
change in the design
H0: mu1 –mu2 = 0
ALTERNATE HYPOTHESIS: There is difference between the mean distances for the two
models due to change in the design.
Ha : mu1 – mu2 ≠ 0

8
Considering the two samples to have unequal variance as well equal variance, we get similar p-
value.
We will not reject the hypothesis as:
P(T<=t) = 0.184 > 0.05 (significance value)
If P-value is greater than its significance value, we will not reject the hypothesis.
There is no significant difference in driving distances between golf balls with coating and
without coating.
It's important to note that not finding a significant difference does not necessarily mean that there
is no difference in reality; it just means that we didn't detect it with the sample size and data we
have. Additionally, it is important to consider the practical significance or effect size, which can
help to assess the magnitude of the difference between the groups, even if it's not statistically
significant.

9
FINDING AND INFERENCE
According to the test is there is no significant change in the driving distance due to the change in
design But the new ball design have lower mean value (267.5) and higher variance (97.94)
compared to Current ball (mean:270.3 & variance:76.61) which showcase the difference in
outcome. Lastly the sampling error are still unknown making the case ambiguous. So my
suggestion would be to collect more samples for the same and identifying more variable which
may directly or indirectly effect the total outcome.
What is the 95% confidence interval for the population mean of each model, and what is the 95%
confidence interval for the difference between the means of the two population? Do you see a
need for larger sample sizes and more testing with the golf balls?
95% confidence interval for Current balls driving distance mean is [267.4757 TO 273.0743]
95% confidence interval for New balls driving distance mean is [264.3348 TO 270.6652]
95% confidence interval for difference in mean is [-1.384937 TO 6.934937]

 The data suggests that, on average, golf balls with coating (New) tend to have slightly
shorter driving distances compared to golf balls without coating (Current).
 However, the difference in driving distances between the two groups is not substantial,
with the median values being close.
 The variability in driving distances is higher for golf balls with coating, as indicated by
the higher variance and standard deviation.
 The t-statistic of 1.2882 suggests that there is a slight difference in means between the
"Current" and "New" groups.
 The one-tailed p-value of 0.1026 is greater than the significance level of 0.05 (alpha =
0.05), suggesting that we do not have enough evidence to reject the null hypothesis
(which assumes no difference in means) at the 0.05 significance level.
 The two-tailed p-value of 0.2053 is also greater than 0.05, indicating that we do not have
strong evidence to conclude that the means are significantly different in either direction.
 For the "Current" group, the 95% confidence interval for the mean driving distance
ranges from 267.562 to 272.988 yards. This means that we are 95% confident that the
true mean driving distance for the "Current" group falls within this range.
 For the "New" group, the 95% confidence interval for the mean driving distance ranges
from 264.404 to 270.546 yards. Similarly, we are 95% confident that the true mean
driving distance for the "New" group falls within this range.
 The confidence intervals for both groups overlap, indicating that there is no significant
difference between the estimated mean driving distances of the "Current" and "New"
groups at the 95% confidence level.

10
 This means that, based on the confidence intervals, we cannot conclude that the
introduction of the coating on the golf balls has resulted in a statistically significant
change in driving distance.

CONCLUSION
From the given data, it may be concluded that, statistically there is no significance change in
driving distance due to new coating on golf balls. However, our recommendation is that the test
be carried out with a larger sample size covering number of golf courses (at least a five different)
to improve the accuracy of the test results and negating any effect of one type of ground. Also,
the results need to interpreted and future actions be planned with the understanding of other
characteristics like size, shape, weight etc.

11
REFERNCES:
1. Kaggle.com
2.

Golf 1.xlsx

12
13

You might also like