You are on page 1of 39

[COMPANY’S LOGO]

GRADED PROJECT ADVANCE


STATISTICS: BUSINESS REPORT

SUBMITTED BY

LT COL SAURABH S BISHT


13 AUG 2023
2

ADVANCE STATISTICS PROJECT


Table of Contents
Problem 1 ........................................................................................................................... 5
1.1 What is the probability that a randomly chosen player would suffer an injury?............... 5
1.2 What is the probability that a player is a forward or a winger?........................................ 5
1.3 What is the probability that a randomly chosen player plays in a striker position and has
a foot injury? ............................................................................................................................... 6
1.4 What is the probability that a randomly chosen injured player is a striker? ..................... 6
1.5 What is the probability that a randomly chosen injured player is either a forward or an
attacking midfielder?................................................................................................................... 6
Problem 2 ........................................................................................................................... 7
2.1 What are the probabilities of a fire, a mechanical failure, and a human error
respectively? ............................................................................................................................... 7
2.2 What is the probability of a radiation leak? ..................................................................... 8
2.3 Suppose there has been a radiation leak in the reactor for which the definite cause is not
known. What is the probability that it has been caused by: ......................................................... 8
• A Fire. ................................................................................................................................. 8
• A Mechanical Failure. .......................................................................................................... 8
• A Human Error. ................................................................................................................... 9
Problem 3: ........................................................................................................................ 10
3.1 What proportion of the gunny bags have a breaking strength less than 3.17 kg per sq
cm? 10
3.2 What proportion of the gunny bags have a breaking strength at least 3.6 kg per sq cm.?
11
3.3 What proportion of the gunny bags have a breaking strength between 5 and 5.5 kg per
sq cm.? 11
3.4 What proportion of the gunny bags have a breaking strength NOT between 3 and 7.5 kg
per sq cm.? ................................................................................................................................ 12
Problem 4: ........................................................................................................................ 13
4.1 What is the probability that a randomly chosen student gets a grade below 85 on this
exam? 13
4.2 What is the probability that a randomly selected student scores between 65 and 87? .. 14
4.3 What should be the passing cut-off so that 75% of the students clear the exam? .......... 14
Problem 5: ........................................................................................................................ 16
5.1 Earlier experience of Zingaro with this particular client is favorable as the stone surface
was found to be of adequate hardness. However, Zingaro has reason to believe now that the
unpolished stones may not be suitable for printing. Do you think Zingaro is justified in thinking
so? 16
5.2 Is the mean hardness of the polished and unpolished stones the same? ....................... 17
Problem 6: ........................................................................................................................ 18
3

ADVANCE STATISTICS PROJECT


Problem 7: ........................................................................................................................ 19
7.1 Test whether there is any difference among the dentists on the implant hardness. State
the null and alternative hypotheses. Note that both types of alloys cannot be considered
together. You must state the null and alternative hypotheses separately for the two types of
alloys.? 19
7.2 Before the hypotheses may be tested, state the required assumptions. Are the
assumptions fulfilled? Comment separately on both alloy types.? ............................................. 19
7.3 Irrespective of your conclusion in 2, we will continue with the testing procedure. What
do you conclude regarding whether implant hardness depends on dentists? Clearly state your
conclusion. If the null hypothesis is rejected, is it possible to identify which pairs of dentists
differ? 24
7.4 Now test whether there is any difference among the methods on the hardness of dental
implant, separately for the two types of alloys. What are your conclusions? If the null hypothesis
is rejected, is it possible to identify which pairs of methods differ? ........................................... 25
7.5 Now test whether there is any difference among the temperature levels on the hardness
of dental implant, separately for the two types of alloys. What are your conclusions? If the null
hypothesis is rejected, is it possible to identify which levels of temperatures differ? ................. 28
7.6 Consider the interaction effect of dentist and method and comment on the ................. 30
7.7 Now consider the effect of both factors, dentist, and method, separately on each alloy.
What do you conclude? Is it possible to identify which dentists are different, which methods are
different, and which interaction levels are different? ................................................................ 33
4

ADVANCE STATISTICS PROJECT


Table of Figures
Figure 1: CDF ........................................................................................................................... 10
Figure 2 : CDF .......................................................................................................................... 11
Figure 3 : CDF .......................................................................................................................... 11
Figure 4 : CDF .......................................................................................................................... 12
Figure 5 : CDF .......................................................................................................................... 13
Figure 6 : CDF .......................................................................................................................... 14
Figure 7 : CDF .......................................................................................................................... 15
Figure 8 : HISTPLOT ............................................................................................................... 16
Figure 9 : HISTPLOT ............................................................................................................... 17
Figure 10 : HISTPLOT ............................................................................................................. 18
Figure 11 : QQ PLOT ............................................................................................................... 20
Figure 12 : BOXPLOT ............................................................................................................. 21
Figure 13 : BOXPLOT ............................................................................................................. 23
Figure 14 : POINT PLOT ......................................................................................................... 24
Figure 15 : POINT PLOT ......................................................................................................... 25
Figure 16 : POINT PLOT ......................................................................................................... 26
Figure 17 : POINT PLOT ......................................................................................................... 27
Figure 18 : POINT PLOT ......................................................................................................... 29
Figure 19 : POINT PLOT ......................................................................................................... 30
Figure 20 : POINT PLOT ......................................................................................................... 31
Figure 21 : POINT PLOT ......................................................................................................... 32
Figure 22 : POINT PLOT ......................................................................................................... 34
Figure 23 : INTERACTION PLOT .......................................................................................... 35
Figure 24 : POINT PLOT ......................................................................................................... 37
Figure 25 : INTERACTION PLOT .......................................................................................... 39
5

ADVANCE STATISTICS PROJECT


BUSINESS REPORT

Problem 1
A physiotherapist with a male football team is interested in studying the relationship
between foot injuries and the positions at which the players play from the data collected

Striker Forward Attacking Midfielder Winger Total


Players Injured 45 56 24 20 145
Players Not Injured 32 38 11 9 90
Total 77 94 35 29 235

1.1 What is the probability that a randomly chosen player would suffer an
injury?

Ans. The data collected from the male football team indicates that out of 235 players,
145 players have suffered injuries, while the rest have not. To determine the probability
that a randomly chosen player would suffer an injury, we use the following formula:

Probability = Number of Injured Players * 100


Total Number of Players

By substituting the values from the given data into the formula, we find that the
probability of a player suffering an injury is approximately 61.70%.

1.2 What is the probability that a player is a forward or a winger?

Ans. The data collected from the male football team shows that there are 94 players
in the forward position and 29 players in the winger position. To find the probability of
a player being a forward or a winger, we add the number of players in these positions
and divide it by the total number of players:

Probability = (Number of Forward Players + Number of Winger Players) * 100


Total Number of Players

By substituting the values from the given data into the formula, we find that the
probability of a player being a forward or a winger is approximately 52.34%.
6

ADVANCE STATISTICS PROJECT


1.3 What is the probability that a randomly chosen player plays in a striker
position and has a foot injury?

Ans. Based on the data collected, there are 45 players who are both strikers and
injured. To calculate the probability of a randomly chosen player being a striker and
injured, we divide the number of such players by the total number of players.

Probability = Number of Strikers who are Injured * 100


Total Number of Players

After performing the calculation with the given data, we find that the probability
of a randomly chosen player being a striker and injured is approximately 19.15%.

1.4 What is the probability that a randomly chosen injured player is a striker?

Ans. Based on the data collected, there are 45 injured players who are strikers. To
calculate the probability of a randomly chosen injured player being a striker, we divide
the number of injured players who are strikers by the total number of injured players.

Probability = Number of Injured Players who are Strikers * 100


Total Number of Injured Players

After performing the calculation with the given data, we find that the probability
of a randomly chosen injured player being a striker is approximately 31.03%.

1.5 What is the probability that a randomly chosen injured player is either a
forward or an attacking midfielder?

Ans. Based on the data collected, there are 56 injured players who are forwards and
24 injured players who are attacking midfielders. To calculate the probability of a
randomly chosen injured player being either a forward or an attacking midfielder, we
add the number of injured players who are forwards to the number of injured players
who are attacking midfielders, and then divide it by the total number of injured players.

Probability = (Injured Forwards + Injured Attacking Midfielders) * 100


Total Number of Injured Players

After performing the calculation with the given data, we find that the probability
of a randomly chosen injured player being either a forward or an attacking
midfielder is approximately 55.17%.
7

ADVANCE STATISTICS PROJECT


Problem 2
An independent research organization is trying to estimate the probability that an
accident at a nuclear power plant will result in radiation leakage. The types of accidents
possible at the plant are, fire hazards, mechanical failure, or human error. The research
organization also knows that two or more types of accidents cannot occur
simultaneously.

According to the studies carried out by the organization, the probability of a


radiation leak in case of a fire is 20%, the probability of a radiation leak in case of a
mechanical 50%, and the probability of a radiation leak in case of a human error is
10%. The studies also showed the following;

• The probability of a radiation leak occurring simultaneously with a fire is


0.1%.

• The probability of a radiation leak occurring simultaneously with a


mechanical failure is 0.15%.

• The probability of a radiation leak occurring simultaneously with a human


error is 0.12%.

On the basis of the information available, answer the questions below:

2.1 What are the probabilities of a fire, a mechanical failure, and a human
error respectively?

Ans. Based on the given information about the probabilities of radiation leaks for each
type of accident (Fire – 20%, Mechanical Failure – 50% & Human Error – 10%) and
the probabilities of radiation leaks occurring simultaneously with different types of
accidents (Fire – 0.1%, Mechanical Failure – 0.15% & Human Error – 0.12%), we
calculated the probabilities of all types of errors by using the formula:-

P(A | B) = P(A ∩ B) / P(B)

The probability of a fire accident is found to be 0.50%, the probability of a


mechanical failure accident is 0.30%, and the probability of a human error accident
is 1.20%. These probabilities represent the likelihood of each type of accident occurring
independently without considering radiation leaks.

Insight. It is important for the research organization to understand these


probabilities as they estimate the probability of radiation leakage during accidents at
the nuclear power plant. These probabilities can be used to assess and prioritize
safety measures to prevent and mitigate radiation leaks in the event of an
accident.
8

ADVANCE STATISTICS PROJECT


2.2 What is the probability of a radiation leak?

Ans. To calculate the probability of a radiation leak (P(R)), we can use the formula:

P(R) = P(R ∩ F) + P(R ∩ M) + P(R ∩ H) + P(R ∩ N)

(a) Based on the provided data, the research organization performed


calculations to find that the probability of a radiation leak is 0.37%. This
probability represents the likelihood of a radiation leak occurring under various
accident scenarios.

(b) Understanding this probability is crucial for assessing the potential risks
associated with different types of accidents and taking appropriate preventive
measures to ensure the safety of the nuclear power plant and its surroundings.
By quantifying the probability of radiation leakage, the organization can make
informed decisions and implement necessary protocols to minimize the impact
of such incidents.

2.3 Suppose there has been a radiation leak in the reactor for which the
definite cause is not known. What is the probability that it has been caused by:

• A Fire.

Ans. The probability of a Fire radiation can be calculated using the formula

P(F/R)= P(P(R ∩ F)/ P(R)

(a) By applying the formula, where P(F/R) represents the probability of a fire
causing the radiation leak and P(R) is the overall probability of a radiation leak,
the organization determined that the probability of the radiation leak being
caused by a fire is 27.03%.

(b) Understanding the likelihood of a fire as the underlying cause of the


radiation leak can significantly aid in identifying potential factors contributing to
such incidents and formulating strategies for prevention and mitigation. This
probability can guide the organization's response and investigation efforts to
ensure the safety and reliability of the nuclear power plant's operations.

• A Mechanical Failure.

Ans. (a) By employing the equation P(M/R) = P(R ∩ M) / P(R), where


P(M/R) signifies the probability of a mechanical failure causing the
radiation leak and P(R) denotes the overarching probability of a radiation
leak, the research organization determined that the likelihood of the
radiation leak being attributed to a mechanical failure is 40.43%.
9

ADVANCE STATISTICS PROJECT


(b) Gaining insights into the potential contribution of mechanical
failure to the radiation leak can serve as a vital factor in identifying
potential vulnerabilities within the operational framework of the nuclear
power plant. This probability can play a pivotal role in guiding the
organization's subsequent steps, including investigations and
countermeasures, aimed at safeguarding the plant's operational integrity
and overall safety.

• A Human Error.

Ans. (a) Utilizing the equation P(H/R) = P(R ∩ H) / P(R), where P(H/R)
denotes the probability of human error causing the radiation leak and
P(R) represents the overall probability of a radiation leak, the research
organization has determined that the probability of the radiation leak
being a result of human error is 32.43%.

(b) Understanding the potential role of human error in causing the


radiation leak carries critical implications for the organization's
subsequent actions. This probability serves as a pivotal guidepost in
steering investigations, enacting corrective measures, and reinforcing
safety protocols to mitigate the risk of future incidents.
10

ADVANCE STATISTICS PROJECT


Problem 3:
The breaking strength of gunny bags used for packaging cement is normally distributed
with a mean of 5 kg per sq. centimeter and a standard deviation of 1.5 kg per sq.
centimeter. The quality team of the cement company wants to know the following about
the packaging material to better understand wastage or pilferage within the supply
chain; Answer the questions below based on the given information; (Provide an
appropriate visual representation of your answers, without which marks will be
deducted)

3.1 What proportion of the gunny bags have a breaking strength less than
3.17 kg per sq cm?

Ans. To calculate the probability directly, we can use the cumulative distribution
function (CDF) of the normal distribution. The CDF gives us the probability that a
random variable is less than or equal to a certain value.

Figure 1: CDF

(b) The probability curve in the image shows the Cumulative Distribution
Function (CDF) of the strength of gunny bags. The CDF shows the probability
that a gunny bag will break with a given breaking strength or less (3.17 kg per
sq cm). The x-axis of the graph shows the breaking strength of the gunny bag,
and the y-axis shows the probability of the gunny bag breaking at that breaking
strength.

(c) The curve reaches a maximum point at a breaking strength of 3.17 kg/sq
cm. This means that there is a 90% chance of the gunny bag breaking if its
breaking strength is 3.17 kg/sq cm. After this point, the probability of the gunny
bag breaking starts to decrease as the breaking strength continues to increase.
Hence, the CDF shows that 11.12% of the gunny bags will have a breaking
strength of 3.17 kg/sq cm or less. This means that 11.12% of the gunny bags
are at risk of breaking and causing wastage or pilferage.
11

ADVANCE STATISTICS PROJECT


3.2 What proportion of the gunny bags have a breaking strength at least 3.6
kg per sq cm.?

Ans. The code generated a probability distribution plot with the shaded area
representing the proportion of gunny bags with a breaking strength at least 3.6 kg per
sq cm. The vertical line indicates the breaking strength value of interest, and the
percentage of bags with strength at least 3.6 kg per sq cm is displayed on the plot. As
per the plot, the proportions of gunny bags having a breaking strength of at least 3.6
kg per sq cm is 82.4%.

Figure 2 : CDF

Insights.

(b) In the CDF above, the red line intersects the y-axis at 0.8238, which
means that 82.38% of the gunny bags will have a breaking strength of at
least 3.6 kg per cm sq.

3.3 What proportion of the gunny bags have a breaking strength between 5
and 5.5 kg per sq cm.?

Ans. To find the proportion of gunny bags with a breaking strength between 5 and
5.5 kg per sq cm, we need to calculate the area under the probability distribution curve
between these two values. We have used the cumulative distribution function (CDF) to
do this. The CDF gives us the probability that a random variable (in this case, breaking
strength) is less than or equal to a given value which in this case works out to be
13.06%.

Figure 3 : CDF
12

ADVANCE STATISTICS PROJECT


(a) The plot shows the cumulative distribution function (CDF) of the breaking
strength of the gunny bags. The yellow line shows the probability that a
randomly selected gunny bag will have a breaking strength between 5 and
5.5 kg per sq cm which is 13.06%.

3.4 What proportion of the gunny bags have a breaking strength NOT
between 3 and 7.5 kg per sq cm.?

Ans. By calculating the cumulative distribution function (CDF) within the specified
bounds and subtracting it from 1, the organization has determined that approximately
13.90% of gunny bags possess breaking strengths either below 3 kg per sq. centimeter
or above 7.5 kg per sq. centimeter. This insight enables the organization to assess the
extent of variation in breaking strengths and make informed decisions pertaining to
wastage, quality control, and supply chain efficiency.

Figure 4 : CDF

(b) The plot shows the probability distribution of the breaking strength of
gunny bags. The mean breaking strength is 5 kg per sq. centimeter and the
standard deviation is 1.5 kg per sq. centimeter. The shaded area in the plot
represents the proportion of gunny bags that have a breaking strength NOT
between 3 and 7.5 kg per sq cm which is equal to 13.90%.

(c) The quality team of the cement company should be concerned about this
proportion not between 3 and 7.5 Kgs. It is possible that these gunny bags are
not strong enough to withstand the weight of the cement, or that they are too
strong and could cause damage to the cement. The quality team should
investigate the cause of these outliers and take steps to reduce their number.
13

ADVANCE STATISTICS PROJECT


Problem 4:
Grades of the final examination in a training course are found to be normally
distributed, with a mean of 77 and a standard deviation of 8.5. Based on the given
information answer the questions below.

4.1 What is the probability that a randomly chosen student gets a grade
below 85 on this exam?

Ans. (a) To determine the probability that a randomly chosen student gets a grade
below 85, we utilize the concept of z-score. The z-score measures how many standard
deviations a particular grade is away from the mean.

(b) we calculated the z-score of approximately 0.94111 for the grade 85


using the formula

z = (X - μ) / σ

where X is the grade, μ is the mean, and σ is the standard deviation,

(c) Next, we utilize the cumulative distribution function (CDF) of the standard
normal distribution to find the probability associated with this z-score. Since the
CDF provides the probability of a value being less than or equal to a given z-
score, we subtract the CDF value from 1 to obtain the probability of a grade
exceeding the z-score.

Figure 5 : CDF

(a) The shaded area under the curve to the left of the vertical line represents
the probability that a randomly chosen student will get a grade below 85. This
area is equal to 17.33%. This means that there is a 17.33% chance that a
randomly chosen student will get a grade below 85 on this exam.
14

ADVANCE STATISTICS PROJECT


4.2 What is the probability that a randomly selected student scores between
65 and 87?

Ans. (a) In the code, we calculated the z-scores for the lower and upper bounds
of the score range (65 and 87) using the formula

z = (X - μ) / σ
where X is the value, μ is the mean, and σ is the standard deviation

(b) Use the cumulative distribution function (CDF) of the normal distribution
to calculate the probabilities corresponding to the z-scores. This gives us the
probability that a randomly selected student's score is below a certain value.

(c) The probability that a student's score falls within the specified range
(between 65 and 87) is the difference between their respective probabilities.

Figure 6 : CDF

(d) The shaded area between the vertical lines at 65 and 87 represents the
probability that a randomly selected student will get a grade between 65 and 87.
This area is equal to 80.13%. This means that there is an 80.13% chance that
a randomly selected student will get a grade between 65 and 87 on this
exam.

4.3 What should be the passing cut-off so that 75% of the students clear the
exam?

Ans. (a) To find the passing cut-off score that ensures 75% of the students clear
the exam, we used the Percent-Point Function (PPF), also known as the
quantile function.

(b) The stats.norm.ppf() function was used to calculate the passing cut-off
score at 75 percentile for a normal distribution. We then provided the loc
(location or mean) and scale (standard deviation) parameters to the ppf()
functionto help understand the distribution's characteristics.
15

ADVANCE STATISTICS PROJECT

Figure 7 : CDF

(c) The plot shows the probability distribution of grades for the final
examination in a training course. The mean grade is 77 and the standard
deviation is 8.5. The vertical line at 82.73 represents the passing cut-off for
75% of the students to clear the exam.
16

ADVANCE STATISTICS PROJECT


Problem 5:
Zingaro stone printing is a company that specializes in printing images or patterns on
polished or unpolished stones. However, for the optimum level of printing of the image
the stone surface has to have a Brinell's hardness index of at least 150. Recently,
Zingaro has received a batch of polished and unpolished stones from its clients. Use
the data provided to answer the following (assuming a 5% significance level);

5.1 Earlier experience of Zingaro with this particular client is favorable as the
stone surface was found to be of adequate hardness. However, Zingaro has
reason to believe now that the unpolished stones may not be suitable for
printing. Do you think Zingaro is justified in thinking so?

Ans. To determine whether Zingaro is justified in thinking that unpolished stones may
not be suitable for printing, we need to perform a hypothesis test.

(a) Setting Up Hypotheses.

(i) Null Hypothesis (H0). The mean hardness of unpolished


stones is equal to or greater than the mean hardness of treated/polished
stones.

(ii) Alternative Hypothesis (Ha). The mean hardness of


unpolished stones is less than the mean hardness of treated/polished
stones.

(b) Thereafter, we setup significance level = 0.05 (5%). As we want to test


whether unpolished stones have a lower hardness than a certain threshold (150)
or not i.e. we are specifically interested in one direction of effect, which is a
decrease in hardness, a One Tailed T-Test Hypothesis Test was performed.

Figure 8 : HISTPLOT

(c) As the value of P-Value is < Significance value, we fail to reject H0 and
hence, it appears that Zingaro is justified in thinking that the hardness of
unpolished stones has decreased.
17

ADVANCE STATISTICS PROJECT


5.2 Is the mean hardness of the polished and unpolished stones the same?

Ans. (a) To test whether the mean hardness of the polished and unpolished
stones is the same, we need to perform a hypothesis test to compare the means
of two independent samples. In this case, we can use a two-sample t-test with
significance level = 0.05 (5%).

(b) Setting Up Hypotheses.

(i) Null Hypothesis (H0). Null Hypothesis (H0): The mean


hardness of polished stones is equal to the mean hardness of unpolished
stones.

(ii) Alternative Hypothesis (Ha). The mean hardness of polished


stones is not equal to the mean hardness of unpolished stones.

Figure 9 : HISTPLOT

(c) As the value of P-Value is < Significance value, we fail to reject H0 and
hence, it appears that the mean hardness of polished stones is not equal
to the mean hardness of unpolished stones.
18

ADVANCE STATISTICS PROJECT


Problem 6:
Aquarius health club, one of the largest and most popular cross-fit gyms in the country
has been advertising a rigorous program for body conditioning. The program is
considered successful if the candidate is able to do more than 5 push-ups, as compared
to when he/she enrolled in the program. Using the sample data provided can you
conclude whether the program is successful? (Consider the level of Significance as 5%)
Note that this is a problem of the paired-t-test. Since the claim is that the training will
make a difference of more than 5, the null and alternative hypotheses must be formed
accordingly.

Ans. (a) In the context of Aquarius health club's program for body conditioning,
we are interested in determining whether the program has led to a statistically
significant improvement in the ability of participants to do push-ups. The program is
considered successful if participants can do more than 5 push-ups compared to when
they enrolled.

(b) Setting Up Hypotheses.

(i) Null Hypothesis (H0). The mean difference in push-up


counts is 5 or less (no improvement).

(ii) Alternative Hypothesis (Ha). The mean difference in push-


up counts is greater than 5 (improvement of more than 5 push-ups).

Figure 10 : HISTPLOT

(iii) The histogram of differences between before-and-after push-up


counts is plotted using sns.histplot. The red dashed line represents the
mean difference. This visualization provides insights into how the
differences are distributed and whether they are centered around a
positive value (indicating improvement). As p_value < significance
value, we can reject the hypothesis and hence we can assume that the
program is successful and participants can do more than 5 push-
ups on average.
19

ADVANCE STATISTICS PROJECT


Problem 7:
Dental implant data: The hardness of metal implant in dental cavities depends on
multiple factors, such as the method of implant, the temperature at which the metal is
treated, the alloy used as well as on the dentists who may favour one method above
another and may work better in his/her favourite method. The response is the variable
of interest.

7.1 Test whether there is any difference among the dentists on the implant
hardness. State the null and alternative hypotheses. Note that both types of
alloys cannot be considered together. You must state the null and alternative
hypotheses separately for the two types of alloys.?

Ans. To test the hypotheses for both alloys, we can perform a two-way ANOVA
(Analysis of Variance) test. The two-way ANOVA is particularly useful when we want
to analyze the impact of multiple categorical independent variables (in this case,
"Dentist" and "Alloy") on a continuous dependent variable (in this case, "Response").To
test whether there is any difference among the dentists on the implant hardness for
each type of alloy, we can use a two-way ANOVA test. Since we need to consider two
types of alloys separately, we will perform two separate two-way ANOVA tests.

(b) Alloy 1.

(i) Null Hypothesis (H0). There is no significant difference


among the dentists on the implant hardness for alloy type 1.

(ii) Alternative Hypothesis (Ha). There is a significant difference


among the dentists on the implant hardness for alloy type 1.

(c) Alloy 2.

(i) Null Hypothesis (H0). There is no significant difference


among the dentists on the implant hardness for alloy type 2.

(ii) Alternative Hypothesis (Ha). There is a significant difference


among the dentists on the implant hardness for alloy type 2.

7.2 Before the hypotheses may be tested, state the required assumptions. Are
the assumptions fulfilled? Comment separately on both alloy types.?

Ans. (a) Before performing a two-way ANOVA test, it's important to consider and
assess the assumptions associated with the test. The key assumptions for a two-way
ANOVA include:

(i) Normality. The residuals (differences between observed values


and predicted values) are normally distributed for each combination of
factors. The Shapiro-Wilk test is a statistical test based on the correlation
between the data and its ranked order. It is a non-parametric test, which
20

ADVANCE STATISTICS PROJECT


means that it does not make any assumptions about the underlying
distribution of the data.

(ii) Homogeneity of Variances. The variability of residuals is


roughly constant across different levels of the factors. The Levene's test
is used to assess whether the variances of two or more groups are equal.
It's a way to test the assumption of homogeneity of variances, which is
important for conducting certain statistical analyses, like analysis of
variance (ANOVA).

(iii) Independence. The observations are independent within and


among groups.

(b) Checking Assumptions for Alloy 1.

(i) Normality. The majority of combinations follow a normal


distribution, with two exceptions that require further investigations. The
results suggest that for most of the combinations, the assumption
of normality is met.

Figure 11 : QQ PLOT
21

ADVANCE STATISTICS PROJECT

(ii) Homogeneity of Variance.

• p=0.257. The p-value indicates the probability of


observing a Levene's statistic as extreme as the one we obtained,
assuming that the variances are equal across all groups.

Figure 12 : BOXPLOT
22

ADVANCE STATISTICS PROJECT


(iii) Independence.

• p-value: 1.0. This is the highest possible p-value,


indicating that the observed frequencies of 'Dentist' and
'Method' combinations are exactly what would be expected if
the two variables were completely independent of each other.

• "Dentist and Method are independent". Since the


p-value is greater than the typical significance level (e.g., 0.05),
we fail to reject the null hypothesis of the Chi-Squared test. In this
context, the null hypothesis is that 'Dentist' and 'Method' are
independent, and the test result supports this statement.

(c) Checking Assumptions for Alloy 2.

(i) Normality. Every single combination of 'Dentist' and 'Method'


has a p-value greater than 0.05, indicating that the data for these
combinations follows a normal distribution. This is consistent across all
the combinations tested. The results suggest that the data for Alloy 2,
across all the combinations of 'Dentist' and 'Method', meets the
assumption of normality.
23

ADVANCE STATISTICS PROJECT

(ii) Homogeneity of Variance.

• p-Value. The p-value is 0.237, which is greater than


the standard significance level of 0.05. This means there is not
enough evidence to conclude that the variances are different
across the groups for Alloy 2. In other words, the assumption of
homogeneity of variance is met.

Figure 13 : BOXPLOT

(iii) Independence.

• p-value: 1.0. This is the highest possible p-value,


indicating that the observed frequencies of 'Dentist' and
'Method' combinations are exactly what would be expected if
the two variables were completely independent of each other.
24

ADVANCE STATISTICS PROJECT


• "Dentist and Method are independent.". Since the
p-value is greater than the typical significance level (e.g., 0.05),
we fail to reject the null hypothesis of the Chi-Squared test. In this
context, the null hypothesis is that 'Dentist' and 'Method' are
independent, and the test result supports this statement.

7.3 Irrespective of your conclusion in 2, we will continue with the testing


procedure. What do you conclude regarding whether implant hardness depends
on dentists? Clearly state your conclusion. If the null hypothesis is rejected, is
it possible to identify which pairs of dentists differ?

Ans. (a) As we wanted to specifically conclude whether the implant hardness


depends on dentists and did not considering the 'Method' variable in this
particular analysis, a one-way ANOVA was considered to be appropriate for
each alloy.

(b) We performed separate one-way ANOVAs for each alloy type to


determine if there is a significant difference in implant hardness among the
different dentists. This would directly test the effect of the 'Dentist' variable on
implant hardness, without considering any other factors.

(c) Alloy 1.

Figure 14 : POINT PLOT

• p-value. The p-value of 0.1166 is greater than the


conventional significance level of 0.05. In statistical testing, if the p-value
is less than 0.05, we reject the null hypothesis (i.e., we conclude that
there is a significant difference among the groups). However, since the
25

ADVANCE STATISTICS PROJECT


p-value is greater than 0.05 in this case, we fail to reject the null
hypothesis.

• Conclusion. The result of the one-way ANOVA test for


Alloy 1 suggests that there is no significant difference in implant hardness
among the different dentists. In other words, the implant hardness does
not seem to depend on which dentist is considered for Alloy 1.

(d) Alloy 2.

Figure 15 : POINT PLOT

• p-value. The p-value of 0.7180 is much greater than the


conventional significance level of 0.05. In hypothesis testing, if the p-
value is less than 0.05, we would reject the null hypothesis (i.e., conclude
that there is a significant difference among the groups). Since the p-value
is greater than 0.05, we fail to reject the null hypothesis.

• Conclusion. The one-way ANOVA result for Alloy 2


suggests that there is no significant difference in implant hardness among
the different dentists. This means that for Alloy 2, the implant hardness
does not seem to depend on the dentist. The variations observed
among the dentists can be attributed to random fluctuations and not to
an actual difference in implant hardness among the dentists.

7.4 Now test whether there is any difference among the methods on the
hardness of dental implant, separately for the two types of alloys. What are your
conclusions? If the null hypothesis is rejected, is it possible to identify which
pairs of methods differ?

Ans. (a) To test whether there is any difference among the methods on the
hardness of dental implant for the two types of alloys, we used a one-way
ANOVA.
26

ADVANCE STATISTICS PROJECT


(b) Alloy 1.

Null Hypothesis. H0: Mean Hardness is same across all dentists for
Alloy 1.

Alternate Hypothesis: Ha: Mean Hardness is not same for at least one
pair of dentists for Alloy 1.

Figure 16 : POINT PLOT

(i) P-value (PR(>F)). The p-value of 0.004163 indicates the


probability of observing an F-statistic as extreme as the one computed
(or more) under the null hypothesis (that there's no difference among the
methods). Since this p-value is less than 0.05, we can reject the null
hypothesis.

(ii) Conclusion. There is a statistically significant difference


among the methods regarding the hardness of dental implants for
Alloy 1. The p-value less than 0.05 indicates that at least one method
has a significantly different mean hardness value from the others
for Alloy 1.

(iii) Identifying pairs of methods differing for Alloy 1. A common


approach for this analysis is the Tukey's HSD (Honestly Significant
Difference) test which provides a pairwise comparison between the
methods for Alloy 1.
27

ADVANCE STATISTICS PROJECT


• Methods 1 and 2. The mean difference is -6.1333, and
the p-value is 0.987, which is greater than 0.05, meaning there is
no significant difference in mean hardness between Methods 1
and 2.

• Methods 1 and 3. The mean difference is -124.8, and the


p-value is 0.0085, which is less than 0.05, meaning there is a
significant difference in mean hardness between
Methods 1 and 3.

• Methods 2 and 3. The mean difference is -118.6667,


and the p-value is 0.0128, which is less than 0.05. Therefore, we
reject the null hypothesis for this comparison, meaning there is a
significant difference in mean hardness between Methods
2 and 3.

• Summary. For Alloy 1, the test has identified that


Methods 1 and 3 and Methods 2 and 3 differ significantly,
while there is no significant difference between Methods 1 and 2.

(c) Alloy 2.

Null Hypothesis. H0 : Mean Hardness is same across all dentists for


Alloy 2.

Alternate Hypothesis: Ha: Mean Hardness is not same for at least one
pair of dentists for Alloy 2.

Figure 17 : POINT PLOT

(i) P-value (PR(>F)). The p-value of 0.00005 is very small and


indicates the probability of observing an F-statistic as extreme as this one
if the null hypothesis were true. Since the p-value is less than 0.05, we
can reject the null hypothesis.
28

ADVANCE STATISTICS PROJECT


(ii) Conclusion. There is a statistically significant difference among
the methods regarding the hardness of dental implants for Alloy 2. The
very low p-value (below 0.05) means that at least one method has a
significantly different mean hardness value from the others for Alloy 2,
just as it was with Alloy 1. This indicates that the method used does
have a significant effect on the hardness of the implant for both
alloys.

(iii) Identifying pairs of methods differing for Alloy 2. A common


approach for this analysis is the Tukey's HSD (Honestly Significant
Difference) test which provides a pairwise comparison between the
methods for Alloy 2.

• Methods 1 and 2. There is no significant difference in


mean hardness between Methods 1 and 2 (p-value = 0.987).

• Methods 1 and 3. There is a significant difference in


mean hardness between Methods 1 and 3 (p-value = 0.0085).

• Methods 2 and 3. There is a significant difference in


mean hardness between Methods 2 and 3 (p-value = 0.0128).

• Summary. The test has identified that Methods 1 and 3


and Methods 2 and 3 differ significantly. There is no
significant difference between Methods 1 and 2. The
consistency in these results across both alloys may indicate a
common effect of the methods on implant hardness, irrespective
of the alloy type.

7.5 Now test whether there is any difference among the temperature levels on
the hardness of dental implant, separately for the two types of alloys. What are
your conclusions? If the null hypothesis is rejected, is it possible to identify
which levels of temperatures differ?

Ans. (a) In order to test whether there is any difference among the temperature
levels on the hardness of dental implants, you would need to perform an
analysis of variance (ANOVA) for the temperature factor, separately for the two
types of alloys.
29

ADVANCE STATISTICS PROJECT


(b) Alloy 1.

Null Hypothesis (H0). The mean hardness is the same across all
temperature levels for Alloy 1.

Alternative Hypothesis (Ha). The mean hardness is not the same


for at least one pair of temperature levels.

Figure 18 : POINT PLOT

(i) The output of the one-way ANOVA test for Alloy 1 shows that the
F-statistic is 0.335224, and the p-value is 0.717074.

(ii) Since the p-value is greater than the typical significance level of
0.05, we fail to reject the null hypothesis. Therefore, we conclude that
there is no statistically significant difference among the temperature
levels regarding the hardness of dental implants for Alloy 1.

(c) Alloy 2.

Null Hypothesis (H0). The mean hardness is the same across all
temperature levels for Alloy 2.

Alternative Hypothesis (Ha). The mean hardness is not the same


for at least one pair of temperature levels for Alloy 2.
30

ADVANCE STATISTICS PROJECT

Figure 19 : POINT PLOT

(i) The output of the one-way ANOVA test for Alloy 1 shows that the
F-statistic is 1.883492, and the p-value is 0.164678.

(ii) Since the p-value is greater than the typical significance level of
0.05, we fail to reject the null hypothesis. This means that there is no
statistically significant evidence to conclude that temperature levels have
an effect on the hardness of dental implants for Alloy 2. The difference in
hardness among the temperature levels could be due to random chance.

7.6 Consider the interaction effect of dentist and method and comment on the
interaction plot, separately for the two types of alloys?

Ans. (a) In order to understand the interaction effect between dentists and
methods, we want to look at a plot showing this interaction. The interaction plot
was generated by plotting the mean response at each combination of dentist
and method levels, with lines connecting the means for each level of one factor,
typically with different line styles or colors for each level of the other factor. This
allows us to visually assess how the effects of one factor depends on the level
of the other factor.

(b) Alloy 1.

(i) Dentist Effect. With a p-value of 0.011484, there's evidence


at the 0.05 significance level to reject the null hypothesis that all the
dentists have the same effect on the response. There's something
different about the dentists that affects the response.
31

ADVANCE STATISTICS PROJECT


(ii) Method Effect. Similarly, with a p-value of 0.000284, the
method also significantly affects the response. The choice of method
matters in determining the response.

(iii) Interaction Effect. The most interesting part of this output is the
interaction term (p-value = 0.006793). The interaction effect between the
dentist and the method is significant. This means that the effect of one of
these variables on the response is not the same at all levels of the other
variable.

(iv) Conclusion. The results strongly suggest that not only do the
dentist and method matter individually, but the combination in which they
are used together also matters. This might imply that training or
guidelines should consider the specific combination of dentist and
method, rather than treating them independently

Figure 20 : POINT PLOT

(i) The points for each dentist are not all clustered together. This
means that there is variation in the hardness of the implants produced by
each dentist, even when using the same method. This variation is likely
due to factors such as the experience of the dentist, the technique used,
and the quality of the materials used.

(ii) The lines on the plot also show that there is an interaction effect
between dentist and method. This means that the effect of the dentist on
implant hardness depends on the method used. For example, dentist 1
produces implants with the highest hardness when using method 1, but
the lowest hardness when using method 2. This suggests that dentist 1
is more experienced or skilled in using method 1 than method 2.

(iii) Overall, the interaction plot shows that the hardness of the
implant is affected by both the dentist and the method used. There
is also an interaction effect between dentist and method, meaning that
the effect of the dentist on implant hardness depends on the method
used. This suggests that it is important to consider both the dentist and
the method when choosing a treatment for a dental implant.
32

ADVANCE STATISTICS PROJECT


(c) Alloy 2.

(i) Dentist Effect. The p-value for the dentist effect is 0.371833,
which is greater than the typical significance level of 0.05. This means
that there's insufficient evidence to suggest that the different dentists
have a different effect on the response for Alloy 2. Hence, we fail to reject
the null hypothesis that all dentists have the same effect.

(ii) Method Effect. On the other hand, the p-value for the method
effect is very low (0.000004), providing strong evidence that the method
does significantly affect the response. The choice of method matters for
Alloy 2, and different methods will lead to different responses.

(iii) Interaction Effect. The interaction term has a p-value of


0.093234, which is greater than 0.05. This suggests that, for Alloy 2, there
is no significant interaction between the dentist and the method. The
effect of one of these variables on the response does not depend on the
level of the other variable. This is quite different from Alloy 1, where a
significant interaction was found.

(iv) Conclusion. For Alloy 2, the choice of dentist does not


seem to matter, but the method used does have a significant effect on
the response. However, unlike Alloy 1, there's no evidence to suggest
that the combination of dentist and method together affects the response.
This could imply that Alloy 2 is more robust to the choice of dentist, and
that focusing on the method might be the most effective way to control
the response. Training or guidelines for working with Alloy 2 might be
more straightforward, as they would not need to consider the specific
combination of dentist and method. Instead, attention could be focused
on optimizing the method alone.

Figure 21 : POINT PLOT


33

ADVANCE STATISTICS PROJECT


(i) The lines for the different methods are more spread out for alloy 2
than for alloy 1. This suggests that there is more variation in the hardness
of the implants produced with alloy 2, regardless of the method used.
This variation could be due to factors such as the quality of the alloy or
the manufacturing process.

(ii) The interaction effect between dentist and method is not as strong
for alloy 2 as it is for alloy 1. This suggests that the effect of the dentist
on implant hardness is not as dependent on the method used for alloy 2.
For example, dentist 1 produces implants with the highest hardness for
both methods for alloy 2, while for alloy 1, dentist 1 produced implants
with the highest hardness only when using method A.

(iii) Overall, the interaction plot for alloy 2 shows that the hardness of
the implant is affected by both the dentist and the method used. However,
the interaction effect between dentist and method is not as strong for alloy
2 as it is for alloy 1. This suggests that it is less important to consider the
dentist when choosing a treatment for a dental implant with alloy 2.

7.7 Now consider the effect of both factors, dentist, and method, separately
on each alloy. What do you conclude? Is it possible to identify which dentists
are different, which methods are different, and which interaction levels are
different?

Ans. (a) Alloy 1. The two-way ANOVA will test these hypotheses by
considering the main effects of the dentists and methods and their interaction. If any of
the p-values associated with these effects are less than the significance level (e.g.,
0.05), the corresponding null hypothesis would be rejected.

Null Hypothesis.

Dentists. The mean hardness is the same across all dentists for
Alloy 1.
Methods. The mean hardness is the same across all methods for
Alloy 1.
Interaction (Dentist*Method). There is no interaction effect between
dentists and methods on the hardness for Alloy 1.

Alternative Hypotheses (Ha).

Dentists. The mean hardness is not the same for at least one pair of
dentists for Alloy 1.
Methods. The mean hardness is not the same for at least one pair of
methods for Alloy 1.
Interaction (Dentist*Method). There is an interaction effect between
dentists and methods on the hardness for Alloy 1.
34

ADVANCE STATISTICS PROJECT

Figure 22 : POINT PLOT

(i) The code provided an analysis of the interaction between dentists


and methods for Alloy 1, both statistically (through the two-way ANOVA)
and visually (through the interaction plot).

(ii) Dentist Effect. The p-value is 0.011484, which is less than


0.05. Hence, we reject the null hypothesis for the dentist effect. This
indicates that the dentist does have a significant effect on the hardness
of dental implant for Alloy 1.

(iii) Method Effect. The p-value is 0.000284, which is also less


than 0.05, so we also reject the null hypothesis for this effect also. This
reveals a statistically significant difference in the mean hardness among
the methods for Alloy 1.

(iv) Interaction Effect (Dentist:Method). The p-value for the


interaction effect is 0.006793, which is again less than 0.05. This
suggests that there is a statistically significant interaction between the
dentists and methods on the hardness of Alloy 1. It means that the effect
of one factor (e.g., method) on the response (hardness) is not the same
at all levels of the other factor (e.g., dentist).

(v) Conclusion.

• Both the dentist and the method significantly affect the


hardness of the dental implant, and there is also a significant
interaction between these two factors. This complexity means that
the effect of a specific dentist on the hardness may vary depending
on the method used, and vice versa.

• This suggests that the choice of both the dentist and the
method needs to be carefully considered when working with Alloy
1. Specific combinations of dentist and method may yield optimal
results, and understanding this interaction could be key to
maximizing implant hardness. It highlights the importance of
comprehensive training and clear guidelines for best practices with
this alloy.
35

ADVANCE STATISTICS PROJECT


(vi) Identify differences.

• Dentists. All the p-values are above 0.05, and the


"reject" column is marked as "False" for all pairs. This means that
there is no statistically significant difference between the means of
any pairs of dentists.

• Methods. The pairs (1, 3) and (2, 3) have p-values


below 0.05, and the "reject" column is marked as "True" for these
pairs. This indicates that there is a statistically significant
difference between these methods. Specifically, method 3 is
significantly different from methods 1 and 2. Hence, we can
conclude that methods are different, particularly between method
3 and methods 1 and 2, but dentists are not different.

• Interaction Levels. From the output, it appears that the


following interaction levels show significant differences:-

Figure 23 : INTERACTION PLOT


36

ADVANCE STATISTICS PROJECT


o 1_1 and 4_3: Rejected (p = 0.007).

o 1_1 and 5_3: Rejected (p = 0.0007).

o 1_2 and 5_3: Rejected (p = 0.0173).

o 1_3 and 5_3: Rejected (p = 0.0079).

o 2_1 and 4_3: Rejected (p = 0.016).

o 2_1 and 5_3: Rejected (p = 0.0016).

o 2_2 and 4_3: Rejected (p = 0.0243).

o 2_2 and 5_3: Rejected (p = 0.0025).

o 2_3 and 5_3: Rejected (p = 0.0065).

o 3_1 and 5_3: Rejected (p = 0.0229).

(b) Alloy 2.. The two-way ANOVA will test these hypotheses by
considering the main effects of the dentists and methods and their interaction.
If any of the p-values associated with these effects are less than the significance
level (e.g., 0.05), the corresponding null hypothesis would be rejected.

Null Hypothesis.

Dentists. The mean hardness is the same across all dentists for
Alloy 2.
Methods. The mean hardness is the same across all methods for
Alloy 2.
Interaction (Dentist*Method). There is no interaction effect between
dentists and methods on the hardness for Alloy 2.

Alternative Hypotheses (Ha).

Dentists. The mean hardness is not the same for at least one pair of
dentists for Alloy 2.
Methods. The mean hardness is not the same for at least one pair of
methods for Alloy 2.
Interaction (Dentist*Method). There is an interaction effect between
dentists and methods on the hardness for Alloy 2.
37

ADVANCE STATISTICS PROJECT

Figure 24 : POINT PLOT

(i) The code provided an analysis of the interaction between dentists


and methods for Alloy 1, both statistically (through the two-way ANOVA)
and visually (through the interaction plot).

(ii) Dentist Effect. The p-value is 0.371833. Since the p-value is


greater than the typical alpha level of 0.05, we fail to reject the null
hypothesis. There's not enough evidence to conclude that the mean
hardness is different across dentists for Alloy 2.

(iii) Method Effect. The p-value is 0.000004, which is less than


0.05. Since the p-value is less than 0.05, we reject the null hypothesis.
There is enough evidence to conclude that the mean hardness is not the
same for at least one pair of methods for Alloy 2

(iv) Interaction Effect (Dentist:Method). The p-value for the


interaction effect is 0.093234 which is greater than 0.05. Since the p-
value is greater than 0.05, we fail to reject the null hypothesis. There's
not enough evidence to conclude that there is an interaction effect
between dentists and methods on the hardness for Alloy 2.

(v) Conclusion.

• For Alloy 2, only the method used has a significant effect


on the hardness of the dental implant, while the dentist effect and
the interaction between dentist and method are not significant.

• This suggests that while the choice of method is crucial for


achieving the desired hardness in Alloy 2, the choice of dentist is
not statistically significant. Unlike Alloy 1, there is no need to
consider specific combinations of dentist and method for Alloy 2,
as their interaction does not play a significant role.
38

ADVANCE STATISTICS PROJECT


• These findings might imply a more standardized approach
for working with Alloy 2, where the focus should be on the method
used rather than the individual differences between dentists.

(vi) Identify differences.

• Dentists. The first output shows the results of


comparing means across different dentist groups (indicated by the
numbers 1 to 5). These comparisons can help us identify whether
the choice of a dentist has an impact on the response variable.
Here's what we can infer:-

o All pairwise comparisons between the dentist


groups have p-values greater than 0.05, which indicates
that the differences in means are not statistically significant.

o The "reject" column states "False" for all


comparisons, meaning that we fail to reject the null
hypothesis that the means are the same across different
dentist groups.

o In conclusion, for the given data, there is no


evidence to suggest that the choice of dentist makes a
difference in the response variable.

• Methods. The second output focuses on the


comparisons between different methods, which we can analyze as
follows:-
39

ADVANCE STATISTICS PROJECT


o Comparing group 1 and 2 (Method 1 vs Method
2). The difference is not significant (p-value = 0.8212),
so we can't conclude that the methods differ.

o Comparing group 1 and 3 (Method 1 vs Method


3). The difference is significant (p-value = 0.0001), and
we reject the null hypothesis. It means that Method 1 and
Method 3 do have a significantly different impact on the
response.

o Comparing group 2 and 3 (Method 2 vs Method


3). Similar to the above, the difference is significant (p-
value = 0.0), indicating that Method 2 and Method 3 differ
significantly.

o Conclusion. There is no significant


difference in the response variable for different dentists.
There are significant differences in the response variable
for some method pairs (specifically Method 1 vs Method 3
and Method 2 vs Method 3), suggesting that the method
used can affect the outcome.

• Interaction Levels. From the output, it appears that the


following interaction levels show significant differences:-

Figure 25 : INTERACTION PLOT

o 1_1 and 5_3 (p-adj = 0.0049).

o 1_2 and 5_3 (p-adj = 0.0003).

o 2_1 and 5_3 (p-adj = 0.0017).

o 2_2 and 5_3 (p-adj = 0.0043).

o 3_1 and 5_3 (p-adj = 0.0279).

You might also like