You are on page 1of 25

_______________________

Advanced Statistics
PROJECT BUSINESS REPORT
_______________________
DSBA

ROHINI ROKDE
ADVANCED STATISTICS BUSINESS REPORT
Contents
Problem 1
1.1. For this data, construct the following contingency tables (Keep Gender
as row variable)

1.1.1. Gender and Major

1.1.2. Gender and Grad Intention

1.1.3. Gender and Employment

1.1.4. Gender and Computer

1.2. Assume that the sample is representative of the population of CMSU.


Based on the data, answer the following question:

1.2.1. What is the probability that a randomly selected CMSU student will be
male?

1.2.2. What is the probability that a randomly selected CMSU student will be
female?

1.3. Assume that the sample is representative of the population of CMSU.


Based on the data, answer the following question:

1.3.1. Find the conditional probability of different majors among the male
students in CMSU.

1.3.2 Find the conditional probability of different majors among the female
students of CMSU.

1.4. Assume that the sample is a representative of the population of CMSU.


Based on the data, answer the following question:

1.4.1. Find the probability That a randomly chosen student is a male and
intends to graduate.

1.4.2 Find the probability that a randomly selected student is a female and
does NOT have a laptop. 

1.5. Assume that the sample is representative of the population of CMSU.


Based on the data, answer the following question:

1.5.1. Find the probability that a randomly chosen student is a male or has
full-time employment?

1.5.2. Find the conditional probability that given a female student is


randomly chosen, she is majoring in international business or management.

1.6.  Construct a contingency table of Gender and Intent to Graduate at 2


levels (Yes/No). The Undecided students are not considered now and the
table is a 2x2 table. Do you think the graduate intention and being female
are independent events?

1.7. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages.

Answer the following questions based on the data

ADVANCED STATISTICS BUSINESS REPORT


1.7.1. If a student is chosen randomly, what is the probability that his/her
GPA is less than 3?

1.7.2. Find the conditional probability that a randomly selected male earns
50 or more. Find the conditional probability that a randomly selected female
earns 50 or more.

1.8. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages. For each of them comment
whether they follow a normal distribution. Write a note summarizing your
conclusions.

Problem 2

2.1 Do you think there is evidence that means moisture contents in both
types of shingles are within the permissible limits? State your conclusions
clearly showing all steps.

2.2 Do you think that the population mean for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis. What
assumption do you need to check before the test for equality of means is
performed?

Problem 3A:

1. State the null and the alternate hypothesis for conducting one-way
ANOVA for both Education and Occupation individually.
2. Perform a one-way ANOVA on Salary with respect to Education. State
whether the null hypothesis is accepted or rejected based on the ANOVA
results.
3. Perform a one-way ANOVA on Salary with respect to Occupation. State
whether the null hypothesis is accepted or rejected based on the ANOVA
results.
4. If the null hypothesis is rejected in either (2) or in (3), find out which
class means are significantly different. Interpret the result. (Non-
Graded)
5. Perform a two-way ANOVA based on Salary with respect to both
Education and Occupation (along with their interaction
Education*Occupation). State the null and alternative hypotheses and
state your results. How will you interpret this result?
6. Explain the business implications of performing ANOVA for this
particular case study.

ADVANCED STATISTICS BUSINESS REPORT


Table of Figures:

Figure 1 Normal Distribution

Figure 2Interaction Plot

Figure 3 Interaction Plot

Problem 1 - (Download Data)

The Student News Service at Clear Mountain State University (CMSU) has decided to
gather data about the undergraduate students that attend CMSU. CMSU creates and
distributes a survey of 14 questions and receives responses from 62 undergraduates
(stored in the Survey data set).

1.1. For this data, construct the following contingency tables (Keep Gender
as row variable)

ADVANCED STATISTICS BUSINESS REPORT


1.1.1. Gender and Major

1.1.2. Gender and Grad Intention

Male :17 Grad Intention (Yes)

Male:9 Undecided

Male :3 No

Total Male:29

Female: 11 (Yes)

Female:13 (Undecided)

Female: 9 (No)

Total Female :33

Total:62

1.1.3. Gender and Employment

ADVANCED STATISTICS BUSINESS REPORT


Male :7 Full time

Male :19 Part Time

Male:3 Unemployed

Total Male:29

Female:3 Full time

Female: 24 Part time

Female:6 Unemployed

Total Female:33

Total:62

1.1.4. Gender and Computer

Male :3 Desktop

Male :26 Laptop

Total Male:29

Female:29 Laptop

Female: 2 Desktop

Female:2 Tablet

Total Female:33

Total:62

ADVANCED STATISTICS BUSINESS REPORT


1.2. Assume that the sample is representative of the population of CMSU.
Based on the data, answer the following question:

1.2.1. What is the probability that a randomly selected CMSU student will be
male?

From the given table we can see that the, number of male is 29, and the total number of gender is
62. Hence the probability that a randomly chosen player is injured is; P (MALE) = 29/62 = 0.467742

1.2.2. What is the probability that a randomly selected CMSU student will be
female?

From the given table we can see that the, number of Female is 33, and the total number of genders
is 62. Hence the probability that a randomly chosen player is injured is; P (FEMALE) = 33/62
0.532258

1.3. Assume that the sample is representative of the population of CMSU.


Based on the data, answer the following question:

1.3.1. Find the conditional probability of different majors among the male
students in CMSU.

Solution:

1)(the same proportions of Males listed Accounting as their major

4/29=0.137931034%)

2)(the same proportions of Males listed CIS as their major

1/29=0.034482759%)

3) (the same proportions of Males listed Economics/Finance as their major

ADVANCED STATISTICS BUSINESS REPORT


4/29=0.137931034%)

4) (the same proportions of Males listed International Business as their major

2/29=0.068965517%)

5) (the same proportions of Males listed Management as their major

6/29=0.206896552%)

6) (the same proportions of Males listed Other as their major

4/29=0.137931034%)

7) (the same proportions of Males listed Retailing/Marketing as their major

5/29=0.172413793%)

8) (the same proportions of Males listed Undecided as their major

3/29=0.103448276%)

1.3.2 Find the conditional probability of different majors among the female
students of CMSU.

Solution:

1)(the same proportions of Females listed CIS as their major

3/33=0.090909091%)

2)(the same proportions of Females listed Accounting as their major

ADVANCED STATISTICS BUSINESS REPORT


3/33=0.090909091%)

3)(the same proportions of Females listed Economics/Finance as their


major

7/33=0.212121212%)

4)(the same proportions of Females listed International Business as their


major

4/33=0.121212121

%)

5)(the same proportions of Females listed Management as their major

4/33=0.121212121

%)

6)(the same proportions of Females listed Retailing/Marketing as their


major

9/33=0.272727273

%)

7)(the same proportions of Females listed Other as their major

3/33=0.090909091%)

8)(the same proportions of Females listed Undecided as their major

0/33=0%)

ADVANCED STATISTICS BUSINESS REPORT


1.4. Assume that the sample is a representative of the population of CMSU.
Based on the data, answer the following question:

1.4.1. Find the probability That a randomly chosen student is a male and
intends to graduate.

1.4.2 Find the probability that a randomly selected student is a female and
does NOT have a laptop. 

The number of Females and does NOT have a laptop 4. The total number of students
is 62. Hence the probability that a Females and does NOT have a laptop 62

P (Females and does NOT have a laptop) =04/62=0.06451

1.5. Assume that the sample is representative of the population of CMSU.


Based on the data, answer the following question:

1.5.1. Find the probability that a randomly chosen student is a male or has
full-time employment?
The probability that a randomly chosen student is either a male or has
full-time employment 51.61290322580645 %

1.5.2. Find the conditional probability that given a female student is


randomly chosen, she is majoring in international business or management.
Probability that given a female student is randomly chosen, she is
majoring in international business or management 24.24 %

Total number of Student is a Female, she is majoring in international business or


management is 8 (4 + 4), and total number of Student is 62. Hence, the probability
that a randomly chosen player is a forward or a winger is;

P (international business or management) = (4+4)/33 = 0.2424

ADVANCED STATISTICS BUSINESS REPORT


1.6.  Construct a contingency table of Gender and Intent to Graduate at 2
levels (Yes/No). The Undecided students are not considered now and the
table is a 2x2 table. Do you think the graduate intention and being female
are independent events?

The probability that a randomly selected Student is Female 50.0


The probability that a randomly selected student is female and intends
to graduate 55.00000000000001 %
They are not independent events

1.7. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages.

Answer the following questions based on the data

1.7.1. If a student is chosen randomly, what is the probability that his/her


GPA is less than 3?
The probability that his/her GPA is less than 3 is 27.419354838709676 %

1.7.2. Find the conditional probability that a randomly selected male earns
50 or more. Find the conditional probability that a randomly selected female
earns 50 or more.

ADVANCED STATISTICS BUSINESS REPORT


1.8. Note that there are four numerical (continuous) variables in the data
set, GPA, Salary, Spending, and Text Messages. For each of them comment
whether they follow a normal distribution. Write a note summarizing your
conclusions.

ADVANCED STATISTICS BUSINESS REPORT


 

ADVANCED STATISTICS BUSINESS REPORT


ADVANCED STATISTICS BUSINESS REPORT
ADVANCED STATISTICS BUSINESS REPORT
skew value of GPA is -0.3146000894506981
skew value of Salary is 0.5347008436225946
skew value of Spending is 1.5859147414045331
skew value of Text Message is 1.2958079731054333

Problem 2 (Download Data)

An important quality characteristic used by the manufacturers of ABC asphalt


shingles is the amount of moisture the shingles contain when they are packaged.
Customers may feel that they have purchased a product lacking in quality if they
find moisture and wet shingles inside the packaging.   In some cases, excessive
moisture can cause the granules attached to the shingles for texture and coloring
purposes to fall off the shingles resulting in appearance problems. To monitor the
amount of moisture present, the company conducts moisture tests. A shingle is
weighed and then dried. The shingle is then reweighed, and based on the amount of
moisture taken out of the product, the pounds of moisture per 100 square feet is
calculated. The company would like to show that the mean moisture content is less
than 0.35 pounds per 100 square feet.

The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square
feet) for A shingles and 31 for B shingles.

ADVANCED STATISTICS BUSINESS REPORT


2.1 Do you think there is evidence that means moisture contents in both
types of shingles are within the permissible limits? State your conclusions
clearly showing all steps.

Define Null and alternate hypothesis for sample A

step 1:

Testing whether the moisture content is less the permissible limit

The null hypothesis states that the moisture content of sample A is greater or than
equal to the permissible limit, 𝜇 ≥ 0.35

The alternative hypothesis states that the moisture content of sample A is less than
permissible limit, 𝜇 < 0.35

𝐻0 : 𝜇 ≥ 0.35

𝐻𝐴 : 𝜇 < 0.35

Step 2: Decide the significance level

Here we select 𝛼 = 0.05 as given in the question.

Step 3: Identify the test statistic¶

We have two samples (A and B) and we do not know the population standard
deviation. Sample sizes for both samples are not the same. The sample size is , n >
30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample test for A
sample. One tail test for sample A

Step 4: Calculate the p - value and test statistic


tstat -1.4735046253382782
P Value 0.07477633144907513

Step 5: Decide to reject or accept null hypothesis


one-sample t-test p-value= 0.07477633144907513
We do not have enough evidence to reject the null hypothesis in favour
of alternative hypothesis
We conclude that the moisture content is greater than permissible limit
in sample A.

Define Null and alternate hypothesis for sample B

step 1:

ADVANCED STATISTICS BUSINESS REPORT


Testing whether the moisture content is less the permissible limit

The null hypothesis states that the moisture content of sample B is greater or than
equal to the permissible limit, 𝜇 ≥ 0.35

The alternative hypothesis states that the moisture content of sample B is less than
permissible limit, 𝜇 < 0.35

𝐻0 : 𝜇 ≥ 0.35

𝐻𝐴 : 𝜇 < 0.35

Step 2: Decide the significance level

Here we select 𝛼 = 0.05 as given in the question.

Step 3: Identify the test statistic¶

We have two samples (A and B) and we do not know the population standard
deviation. Sample sizes for both samples are not the same. The sample size is , n >
30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for one sample test for B
sample. one tail test for Sample B

Step 4: Calculate the p - value and test statistic


tstat -3.1003313069986995
P Value 0.0020904774003191826

Step 5: Decide to reject or accept null hypothesis


one-sample t-test p-value= 0.0020904774003191826
We have enough evidence to reject the null hypothesis in favour of
alternative hypothesis
We conclude that the moisture content is less than permissible limit in
sample B.

2.2 Do you think that the population mean for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis. What
assumption do you need to check before the test for equality of means is
performed?

 step 1:

Define Null and alternate hypothesis

In testing whether the mean for shingles A and Shingles B are the same, the null
hypothesis states that the mean of shingle A to mean of shingle B are the same,

ADVANCED STATISTICS BUSINESS REPORT


equals . The alternative hypothesis states that the mean are different, is not equal
to

: - 0 i.e.

: - = 0 i.e. =

Step 2: Decide the significance level

Here we select = 0.05 and the population standard deviation is not known.

Step 3: Identify the test statistic

We have two samples and we do not know the population standard deviation.
Sample sizes for both samples are not the same. The sample size is , n > 30. So we
use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic for two sample test.

Step 4: Calculate the p - value and test statistic


tstat 1.2896282719661123
P Value 0.2017496571835306

Step 5: Decide to reject or accept null hypothesis


two-sample t-test p-value= 0.2017496571835306
We do not have enough evidence to reject the null hypothesis in favour
of alternative hypothesis
We conclude that mean for shingles A and singles B are not the same

Problem 3A:

Salary is hypothesized to depend on educational qualification and occupation. To


understand the dependency, the salaries of 40 individuals [SalaryData.csv] are
collected and each person’s educational qualification and occupation are noted.
Educational qualification is at three levels, High school graduate, Bachelor's, and
Doctorate. Occupation is at four levels, Administrative and clerical, Sales,
Professional or specialty, and Executive or managerial. A different number of
observations are in each level of education – occupation combination.

 [Assume that the data follows a normal distribution. In reality, the normality
assumption may not always hold if the sample size is small.]

1. State the null and the alternate hypothesis for conducting one-way
ANOVA for both Education and Occupation individually.

ADVANCED STATISTICS BUSINESS REPORT


The data has 40
instances with 3
attributes,1 integer
ADVANCED STATISTICS BUSINESS REPORT
type and 2 object
type
The data has 40 instances with 3 atttributes,1 integertype,2 object
type.

No null values in the dataset.

Formulate the Null and alternate hypothesis.

The means of Salary with respect to education is same.

The means of Salary with respect to education is not same.

one-way ANOVA on Salary with respect to Education Variable.

Since the P value is less then significance level 0.05,we reject the null hypothesis.

ADVANCED STATISTICS BUSINESS REPORT


Since the P value is greater then significance level 0.05,we fail to reject the null
hypothesis.

2.Perform a one-way ANOVA on Salary with respect to Education. State whether the
null hypothesis is accepted or rejected based on the ANOVA results.

Formulate the Null and alternate hypothesis.

The means of Salary with respect to education is same.

The means of Salary with respect to education is not same.

one-way ANOVA on Salary with respect to Education Variable.

Since the P value is less then significance level 0.05, we reject the null hypothesis.

3.Perform a one-way ANOVA on Salary with respect to Occupation. State whether


the null hypothesis is accepted or rejected based on the ANOVA results.

ADVANCED STATISTICS BUSINESS REPORT


Since the P value is greater than significance level 0.05,we fail to reject the null
hypothesis.

4.If the null hypothesis is rejected in either (2) or in (3), find out which class means
are significantly different. Interpret the result. (Non-Graded)

5.Perform a two-way ANOVA based on Salary with respect to both Education and
Occupation (along with their interaction Education*Occupation). State the null and
alternative hypotheses and state your results. How will you interpret this result?

As seen from the above interaction plot, there seems to be interaction between two
variable.

ADVANCED STATISTICS BUSINESS REPORT


We can see the following p-values for each of the factor in the table:

Occupation p-value:4.993238e-03

Education p-value is 1.090908e-11

Occupation: Education p-value is 2.913740e-05

Interpretation model summary first lists the independent variables being tested
education and occupation. Next is the residuals, which is the variance in the
dependent variable that is not explained by the independent variables.

The following columns provide all the information needed to interpret the model:

Df shows the degree of freedom for each variable.

sum sq is the sum of squares (the variation the between group means created by
the level of independent variable and the overall mean.

mean sq shows the mean sum of squares (the sum of squares divided by the degree
of freedom.

F value is the T statistic from the F-test (The mean square of the variable divided by
the square of each parameter.

PR(>F) is the p-value of the F statistic, and shows how likely it is that the F-value
calculated from the F-test would have occurred if the null hypothesis of no
difference was true.

6.Explain the business implications of performing ANOVA for this particular case
study.

 ANOVA is a tool to compare the means of three or more groups. At appropriate


significance level (generally 0.05), the test statistics is evaluated. If p-value is less
than significance level, it can be interpreted that there is no evidence in support of
hypothesis that there is no difference in group means. Or, there is significant
difference among group means. While Performing Two Annova Treatment we can
see the following p-values for each of the factors of the table:

ADVANCED STATISTICS BUSINESS REPORT


H0: - The means of Salary with respect to each Education and each Occupation is
same.

H1: - The means of Salary with respect to each Education and each Occupation is
not same.

Occupation: p-value =4.993238e-03

Education: p-value = 1.090908e-11

Education*Occupation: p-value = 2.913740e-05

ADVANCED STATISTICS BUSINESS REPORT

You might also like