You are on page 1of 12

PROJECT REPORT

Great Learning | PGDDSBA | October 25, 2022


Contents:
Problem 1

Clear Mountain State University

For this data, construct the following contingency tables (Keep Gender as row variable)
1.1.1. Gender and Major
1.1 1.1.2. Gender and Grad Intention Page3
1.1.3. Gender and Employment
1.1.4. Gender and Computer

Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:
1.2 1.2.1. What is the probability that a randomly selected CMSU student will be male?... Page 4
1.2.2. What is the probability that a randomly selected CMSU student will be
female?

Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:
1.3.1. Find the conditional probability of different majors among the male students in
1.3 Page 4
CMSU
1.3.2 Find the conditional probability of different majors among the female students Page 5
of CMSU.
Assume that the sample is a representative of the population of CMSU. Based on the
data, answer the following question:
1.4.1. Find the probability That a randomly chosen student is a male and intends to Page 5
1.4 graduate.
1.4.2 Find the probability that a randomly selected student is a female and does NOT Page 6
have a laptop.

Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:
1.5.1. Find the probability that a randomly chosen student is a male or has full-time Page 6
1.5 employment?
1.5.2. Find the conditional probability that given a female student is randomly Page 6
chosen, she is majoring in international business or management.

Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No).


Page 6
1.6 The Undecided students are not considered now and the table is a 2x2 table. Do you
think the graduate intention and being female are independent events?

Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages.
Answer the following questions based on the data
1.7 1.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less Page 7
than 3?
1.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Page 7
Find the conditional probability that a randomly selected female earns 50 or more.

PAGE 1
Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
1.8 Page 7/8
Spending, and Text Messages. For each of them comment whether they follow a normal
distribution. Write a note summarizing your conclusions.

Problem 2 – ABC Asphalt Shingles


Do you think there is evidence that means moisture contents in both types of shingles are
2.1 Page 9
within the permissible limits? State your conclusions clearly showing all steps.
Do you think that the population mean for shingles A and B are equal? Form the
2.2 hypothesis and conduct the test of the hypothesis. What assumption do you need to check Page 9
before the test for equality of means is performed?

Problem 3 – Salary hypothesized to depend on Education and Occupation


State the null and the alternate hypothesis for conducting one-way ANOVA for both
3.1 Page 10
Education and Occupation individually.
Perform a one-way ANOVA on Salary with respect to Education. State whether the null
3.2 Page 10
hypothesis is accepted or rejected based on the ANOVA results.
Perform a one-way ANOVA on Salary with respect to Occupation. State whether the null
3.3 Page 10
hypothesis is accepted or rejected based on the ANOVA results.
If the null hypothesis is rejected in either (2) or in (3), find out which class means are
3.4
significantly different. Interpret the result. (Non-Graded) Page 10
Perform a two-way ANOVA based on Salary with respect to both Education and
3.5 Occupation (along with their interaction Education*Occupation). State the null and
alternative hypotheses and state your results. How will you interpret this result? Page 11
3.6 Explain the business implications of performing ANOVA for this particular case study. Page 11

PAGE 2
Problem 1

Clear Mountain State University

1.1 For this data, construct the following contingency tables (Keep Gender as row variable)
1.1.1. Gender and Major
1.1.2. Gender and Grad Intention
1.1.3. Gender and Employment
1.1.4. Gender and Computer

1.1.1. Gender and Major


Economics/ International Retailing/
Major Accounting CIS Management Other Undecided
Finance Business Marketing
Gender
Female 3 3 7 4 4 3 9 0

Male 4 1 4 2 6 4 5 3

1.1.2. Gender and Grad Intention

Grad Intention No Undecided Yes


Gender
Female 9 13 11
Male 3 9 17

1.1.3. Gender and Employment

Employment Full-Time Part-Time Unemployed

Gender
Female 3 24 6

Male 7 19

1.1.4. Gender and Computer

Computer Desktop Laptop Tablet


Gender
Female 2 29 2
Male 3 26

1.2. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:

PAGE 3
1.2.1. What is the probability that a randomly selected CMSU student will be male?

Gender
Female 33
Male 29

Probability that a randomCMSU student is Male = (29/62)


= 0.46774193548387094

The probability that a randomly selected CMSU student will be male is 46.77%

1.2.2. What is the probability that a randomly selected CMSU student will be female?

Proability that a random CMSU student is Female= (33/62)


= 0.532258064516129

The probability that a random selected CMSU student will be female is 53.225%

1.3. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:

1.3.1. Find the conditional probability of different majors among the male students in CMSU.

Economics/ International Retailing/


Major Accounting CIS Management Other Undecided
Finance Business Marketing

Gender
Female 3 3 7 4 4 3 9 0

Male 4 1 4 2 6 4 5 3

The Probability of different majors among Male students is as follows:

Accounting = (4/29)
= 0.13793103448275862 = 13.793%

CIS = (1/29)
= 0.034482758620689655 = 3.44%

Economics / Finance = (4/29)


= 0.13793103448275862 = 13.793%

International Business = (2/29)


= 0.06896551724137931 = 6.896%0

Management = (6/29)
= 0.20689655172413793 – 20.689%

Retail Marketing = (5/29)


= 0.1724137931034483 = 17.241%

Other = (4/29)
= 0.13793103448275862 = 13.793%

Undecided = (3/29)
= 0.10344827586206896 = 10.344%

PAGE 4
1.3.2 Find the conditional probability of different majors among the female students of CMSU.

The Probability of different majors among female students is as follows:

Accounting = (3/33)
=0.09090909090909091 = 9.09%

CIS = (3/33)
= 0.09090909090909091 = 9.09%

Economics/Finance =(7/33)
= 0.21212121212121213 = 21.21%

International Business = (4/33)


= 0.12121212121212122 = 12.12%

Management = (4/33)
= 0.12121212121212122 = 12.12%

Retail Marketing = (9/33)


= 0.2727272727272727 = 27.27%

Others = (3/33)
= 0.09090909090909091 = 9.09%

1.4. Assume that the sample is a representative of the population of CMSU. Based on the data, answer the
following question:

1.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.

Grad Intention No Undecided Yes


Gender
Female 9 13 11
Male 3 9 17

Prob_Grad Intention_Yes_Male=(17/29)
= 0.5862068965517241 = 58.62%

The probability that a random chosen student is male and intends to graduate is 58.62%

PAGE 5
1.4.2 Find the probability that a randomly selected student is a female and does NOT have a laptop.

Computer Desktop Laptop Tablet


Gender
Female 2 29 2
Male 3 26 0

Probability that a random Female has no laptop =(2/33)

= 0.06060606060606061 = 6.06%

1.5. Assume that the sample is representative of the population of CMSU. Based on the data, answer the
following question:

1.5.1. Find the probability that a randomly chosen student is a male or has full-time employment?

Employment Full-Time Part-Time Unemployed


Gender
Female 3 24 6
Male 7 19 3

Probability of Male with FullTime Employment = (7/29)


= 0.2413793103448276 = 24.13%

1.5.2. Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management.

Probability that a random female student is majoring in International Business and MGT =
((4+4)/33) = 0.24242424242424243 = 24.24%

1.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The Undecided
students are not considered now and the table is a 2x2 table. Do you think the graduate intention and being
female are independent events?

Grad Intention No Yes All


Gender
Female 9 11 20
Male 3 17 20
All 12 28 4

Event A: Probability of the student being female Event B: probability of the student being Female and
a Grad Intent P(A) = 28/40 = 0.70 P(B) = 11/20 = 0.55 P(A)P(B) = 0.700.55 =0.385 The probability of
Grad Intention and being female events are not independent.

1.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending, and
Text Messages.

PAGE 6
Answer the following questions based on the data

1.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3?
No of Students with GPA<3 = 17

Probability of students with GPA <3 = 17/62 = 0.27419354838709675 = 27.41%

1.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.

No of males with earnings 50 and more = 14

Probability of Males with earnings 50 and more = 14/29

= 0.4827586206896552 = 48.275%

No of Females with earnings 50 and more = 18

Probability of Females with earnings 50 and more = 18/33 = 0.5454545454 = 54.54%

1.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending, and
Text Messages. For each of them comment whether they follow a normal distribution. Write a note
summarizing your conclusions.

Salary

From the above plot, we can see that the bell shaped curve is slightly right skewed and the distribution does
not follow a normal distribution.

GPA

PAGE 7
From the above plot, we can see that the bell-shaped curve is slightly left skewed and the distribution does
follow a normal distribution but it does not look very bell shaped.

Salary

From the above plot, we can see that the bell shaped curve is slightly right skewed and the distribution does
not follow a normal distribution.

Text Messages

From the above plot, we can see that the bell shaped curve is slightly right skewed and the distribution does not
follow a normal distribution.

Problem 2-

An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of
moisture the shingles contain when they are packaged. Customers may feel that they have purchased a
product lacking in quality if they find moisture and wet shingles inside the packaging. In some cases,
excessive moisture can cause the granules attached to the shingles for texture and coloring purposes to fall
off the shingles resulting in appearance problems. To monitor the amount of moisture present, the
company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed, and
based on the amount of moisture taken out of the product, the pounds of moisture per 100 square feet is
calculated. The company would like to show that the mean moisture content is less than 0.35 pounds per
100 square feet.

The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shingles and
31 for B shingles.

PAGE 8
2.1 Do you think there is evidence that means moisture contents in both types of shingles are within the
permissible limits? State your conclusions clearly showing all steps.
Null Hypothesis: Moisture content in the ABC asphalt shingles is less than and equal to the permissable limit 0.35
pounds /100sqft

Alternate Hypothesis: Moisture content in the ABC asphalt shingles is greater than and equal to the permissable
limit. Alpha = 0.05 Mean moisture content = 0.35 pounds/100sqft

X_bar A (shingles A) = 11.399/36 = 0.316638888

X_bar B (shingles B) = 8.48/31 = 0.27354838

Variance A = 0.0184

Variance B = 0.0188

N(A) = 36

N(B) = 31

DOF = N(A) + N(B) -2 = 65

Ttest = (X-barA- X_barB)/np.sqrt((0.0184/36)+(0.0188/31)

= 1.2889782

P_value = 0.098702819

From the above hypothesis, we can see that the p_value is greater than the level of significance, so we fail to reject
the null hypothesis.

2.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and
conduct the test of the hypothesis. What assumption do you need to check before the test for equality of
means is performed?

step 1 Null hypothesis: Population mean for shingles A & B are equal
Alternate Hypothesis: Population mean for shingles A & B are not equal

Step 2 Level of significance or alpha = 0.05

Step 3 Calculation of Ttest =1.2889782752752876

Step 4 Calculation of p_value 0.09870281966975125

Step 5 From the results obtained for the above hypothesis, we fail to reject the null hypothesis as the
p_value is greater than the level of significance. So the polulation means for both Shingles A & shingles B
are equal.

Problem 3A:

Salary is hypothesized to depend on educational qualification and occupation. To understand the dependency,
the salaries of 40 individuals [SalaryData.csv] are collected and each person’s educational qualification and
occupation are noted. Educational qualification is at three levels, High school graduate, Bachelor's, and Doctorate.

PAGE 9
Occupation is at four levels, Administrative and clerical, Sales, Professional or specialty, and Executive or
managerial. A different number of observations are in each level of education – occupation combination.

[Assume that the data follows a normal distribution. In reality, the normality assumption may not always hold if
the sample size is small.]

3A.1 State the null and the alternate hypothesis for conducting one-way ANOVA for both Education and
Occupation individually.

Null Hypothesis H0: The mean salary is the same across all three categories of Education.

Alternate Hypothesis H1: The mean salary is different in atleast one category of Education.

df sum_sq mean_sq F PR(>F)


Education 2.0 1.026955e+11 5.134773e+10 30.95628 1.257709e-08
Residual 37.0 6.137256e+10 1.658718e+09 NaN NaN

Above is the ANOVA table for Education. From the table, we can see that the Pvalue 1.257709e-08 is much lesser
than our Alpha ie. 0.05. So we can reject our null hypothesis and conclude that there is a significant difference in
atleast one category in Education.

3.A.3 Perform a one-way ANOVA on Salary with respect to Occupation. State whether the null hypothesis
is accepted or rejected based on the ANOVA results.

Null Hypothesis H0: The mean salary is the same across all three categories of Occupation.

Alternate Hypothesis H1: The mean salary is different in atleast one category of Occupation.

df sum_sq mean_sq F PR(>F)


Occupation 3.0 1.125878e+10 3.752928e+09 0.884144 0.458508
Residual 36.0 1.528092e+11 4.244701e+09 NaN NaN

Above is the ANOVA table for Occupation. From the table, we can see that the P_value 0.4585 is greater
than the significance level (Alpha)0.05. So, we fail to reject the null hypothesis as there is no evidence to
state that the mean salaries is different in at least one category of Occupation.

3A.5 Perform a two-way ANOVA based on Salary with respect to both Education and Occupation (along with their
interaction Education*Occupation). State the null and alternative hypotheses and state your results. How will you
interpret this result? Explain the business implications of performing ANOVA for this particular case study?

df sum_sq mean_sq F PR(>F)


Education 2.0 1.026955e+11 5.134773e+10 72.211958 5.466264e-12
Occupation 3.0 5.519946e+09 1.839982e+09 2.587626 7.211580e-02
Education:Occupation 6.0 3.634909e+10 6.058182e+09 8.519815 2.232500e-05
Residual 29.0 2.062102e+10 7.110697e+08 NaN NaN

PAGE 10
From the above plot, we see that there is a significant amount of interaction between the variables,
Education and Occupation.

From the ANOVA table, we can see that the p value = 2.232500e-05 is lesser than the significance level
(alpha = 0.05), we reject the null hypothesis. Thus, we see that there is an interaction effect between
education and occupation on the mean salary.

From the ANOVA method and the interaction plot, we see that education combined with occupation
results in higher and better salaries among the people. It is clearly seen that people with education as
Doctorate draw the maximum salaries and people with education HS-grad earn the least. Thus, we can
conclude that Salary is dependent on educational qualifications and occupation.

PAGE 11

You might also like