Professional Documents
Culture Documents
Advanced Statistics
PROJECT BUSINESS REPORT
_______________________
DSBA
ROHINI ROKDE
ADVANCED STATISTICS BUSINESS REPORT
Contents
Problem 1
1.1. For this data, construct the following contingency tables (Keep Gender
as row variable)
1.2.1. What is the probability that a randomly selected CMSU student will be
male?
1.2.2. What is the probability that a randomly selected CMSU student will be
female?
1.3.1. Find the conditional probability of different majors among the male
students in CMSU.
1.3.2 Find the conditional probability of different majors among the female
students of CMSU.
1.4.1. Find the probability That a randomly chosen student is a male and
intends to graduate.
1.4.2 Find the probability that a randomly selected student is a female and
does NOT have a laptop.
1.5.1. Find the probability that a randomly chosen student is a male or has
full-time employment?
1.7. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages.
1.7.2. Find the conditional probability that a randomly selected male earns
50 or more. Find the conditional probability that a randomly selected female
earns 50 or more.
1.8. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages. For each of them comment
whether they follow a normal distribution. Write a note summarizing your
conclusions.
Problem 2
2.1 Do you think there is evidence that means moisture contents in both
types of shingles are within the permissible limits? State your conclusions
clearly showing all steps.
2.2 Do you think that the population mean for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis. What
assumption do you need to check before the test for equality of means is
performed?
Problem 3A:
1. State the null and the alternate hypothesis for conducting one-way
ANOVA for both Education and Occupation individually.
2. Perform a one-way ANOVA on Salary with respect to Education. State
whether the null hypothesis is accepted or rejected based on the ANOVA
results.
3. Perform a one-way ANOVA on Salary with respect to Occupation. State
whether the null hypothesis is accepted or rejected based on the ANOVA
results.
4. If the null hypothesis is rejected in either (2) or in (3), find out which
class means are significantly different. Interpret the result. (Non-
Graded)
5. Perform a two-way ANOVA based on Salary with respect to both
Education and Occupation (along with their interaction
Education*Occupation). State the null and alternative hypotheses and
state your results. How will you interpret this result?
6. Explain the business implications of performing ANOVA for this
particular case study.
Problem 1 - (Download Data)
The Student News Service at Clear Mountain State University (CMSU) has decided to
gather data about the undergraduate students that attend CMSU. CMSU creates and
distributes a survey of 14 questions and receives responses from 62 undergraduates
(stored in the Survey data set).
1.1. For this data, construct the following contingency tables (Keep Gender
as row variable)
Male:9 Undecided
Male :3 No
Total Male:29
Female: 11 (Yes)
Female:13 (Undecided)
Female: 9 (No)
Total:62
Male:3 Unemployed
Total Male:29
Female:6 Unemployed
Total Female:33
Total:62
Male :3 Desktop
Total Male:29
Female:29 Laptop
Female: 2 Desktop
Female:2 Tablet
Total Female:33
Total:62
1.2.1. What is the probability that a randomly selected CMSU student will be
male?
From the given table we can see that the, number of male is 29, and the total number of gender is
62. Hence the probability that a randomly chosen player is injured is; P (MALE) = 29/62 = 0.467742
1.2.2. What is the probability that a randomly selected CMSU student will be
female?
From the given table we can see that the, number of Female is 33, and the total number of genders
is 62. Hence the probability that a randomly chosen player is injured is; P (FEMALE) = 33/62
0.532258
1.3.1. Find the conditional probability of different majors among the male
students in CMSU.
Solution:
4/29=0.137931034%)
1/29=0.034482759%)
2/29=0.068965517%)
6/29=0.206896552%)
4/29=0.137931034%)
5/29=0.172413793%)
3/29=0.103448276%)
1.3.2 Find the conditional probability of different majors among the female
students of CMSU.
Solution:
3/33=0.090909091%)
7/33=0.212121212%)
4/33=0.121212121
%)
4/33=0.121212121
%)
9/33=0.272727273
%)
3/33=0.090909091%)
0/33=0%)
1.4.1. Find the probability That a randomly chosen student is a male and
intends to graduate.
1.4.2 Find the probability that a randomly selected student is a female and
does NOT have a laptop.
The number of Females and does NOT have a laptop 4. The total number of students
is 62. Hence the probability that a Females and does NOT have a laptop 62
1.5.1. Find the probability that a randomly chosen student is a male or has
full-time employment?
The probability that a randomly chosen student is either a male or has
full-time employment 51.61290322580645 %
1.7. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages.
1.7.2. Find the conditional probability that a randomly selected male earns
50 or more. Find the conditional probability that a randomly selected female
earns 50 or more.
The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square
feet) for A shingles and 31 for B shingles.
step 1:
The null hypothesis states that the moisture content of sample A is greater or than
equal to the permissible limit, 𝜇 ≥ 0.35
The alternative hypothesis states that the moisture content of sample A is less than
permissible limit, 𝜇 < 0.35
𝐻0 : 𝜇 ≥ 0.35
𝐻𝐴 : 𝜇 < 0.35
We have two samples (A and B) and we do not know the population standard
deviation. Sample sizes for both samples are not the same. The sample size is , n >
30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample test for A
sample. One tail test for sample A
step 1:
The null hypothesis states that the moisture content of sample B is greater or than
equal to the permissible limit, 𝜇 ≥ 0.35
The alternative hypothesis states that the moisture content of sample B is less than
permissible limit, 𝜇 < 0.35
𝐻0 : 𝜇 ≥ 0.35
𝐻𝐴 : 𝜇 < 0.35
We have two samples (A and B) and we do not know the population standard
deviation. Sample sizes for both samples are not the same. The sample size is , n >
30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample test for B
sample. one tail test for Sample B
2.2 Do you think that the population mean for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis. What
assumption do you need to check before the test for equality of means is
performed?
step 1:
In testing whether the mean for shingles A and Shingles B are the same, the null
hypothesis states that the mean of shingle A to mean of shingle B are the same,
: - 0 i.e.
: - = 0 i.e. =
Here we select = 0.05 and the population standard deviation is not known.
We have two samples and we do not know the population standard deviation.
Sample sizes for both samples are not the same. The sample size is , n > 30. So we
use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for two sample test.
Problem 3A:
[Assume that the data follows a normal distribution. In reality, the normality
assumption may not always hold if the sample size is small.]
1. State the null and the alternate hypothesis for conducting one-way
ANOVA for both Education and Occupation individually.
Since the P value is less then significance level 0.05,we reject the null hypothesis.
2.Perform a one-way ANOVA on Salary with respect to Education. State whether the
null hypothesis is accepted or rejected based on the ANOVA results.
Since the P value is less then significance level 0.05, we reject the null hypothesis.
4.If the null hypothesis is rejected in either (2) or in (3), find out which class means
are significantly different. Interpret the result. (Non-Graded)
5.Perform a two-way ANOVA based on Salary with respect to both Education and
Occupation (along with their interaction Education*Occupation). State the null and
alternative hypotheses and state your results. How will you interpret this result?
As seen from the above interaction plot, there seems to be interaction between two
variable.
Occupation p-value:4.993238e-03
Interpretation model summary first lists the independent variables being tested
education and occupation. Next is the residuals, which is the variance in the
dependent variable that is not explained by the independent variables.
The following columns provide all the information needed to interpret the model:
sum sq is the sum of squares (the variation the between group means created by
the level of independent variable and the overall mean.
mean sq shows the mean sum of squares (the sum of squares divided by the degree
of freedom.
F value is the T statistic from the F-test (The mean square of the variable divided by
the square of each parameter.
PR(>F) is the p-value of the F statistic, and shows how likely it is that the F-value
calculated from the F-test would have occurred if the null hypothesis of no
difference was true.
6.Explain the business implications of performing ANOVA for this particular case
study.
H1: - The means of Salary with respect to each Education and each Occupation is
not same.