Professional Documents
Culture Documents
For this data, construct the following contingency tables (Keep Gender as row variable)
1.1.1. Gender and Major
1.1 1.1.2. Gender and Grad Intention Page3
1.1.3. Gender and Employment
1.1.4. Gender and Computer
Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:
1.2 1.2.1. What is the probability that a randomly selected CMSU student will be male?... Page 4
1.2.2. What is the probability that a randomly selected CMSU student will be
female?
Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:
1.3.1. Find the conditional probability of different majors among the male students in
1.3 Page 4
CMSU
1.3.2 Find the conditional probability of different majors among the female students Page 5
of CMSU.
Assume that the sample is a representative of the population of CMSU. Based on the
data, answer the following question:
1.4.1. Find the probability That a randomly chosen student is a male and intends to Page 5
1.4 graduate.
1.4.2 Find the probability that a randomly selected student is a female and does NOT Page 6
have a laptop.
Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:
1.5.1. Find the probability that a randomly chosen student is a male or has full-time Page 6
1.5 employment?
1.5.2. Find the conditional probability that given a female student is randomly Page 6
chosen, she is majoring in international business or management.
Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages.
Answer the following questions based on the data
1.7 1.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less Page 7
than 3?
1.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Page 7
Find the conditional probability that a randomly selected female earns 50 or more.
PAGE 1
Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
1.8 Page 7/8
Spending, and Text Messages. For each of them comment whether they follow a normal
distribution. Write a note summarizing your conclusions.
PAGE 2
Problem 1
1.1 For this data, construct the following contingency tables (Keep Gender as row variable)
1.1.1. Gender and Major
1.1.2. Gender and Grad Intention
1.1.3. Gender and Employment
1.1.4. Gender and Computer
Male 4 1 4 2 6 4 5 3
Gender
Female 3 24 6
Male 7 19
1.2. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:
PAGE 3
1.2.1. What is the probability that a randomly selected CMSU student will be male?
Gender
Female 33
Male 29
The probability that a randomly selected CMSU student will be male is 46.77%
1.2.2. What is the probability that a randomly selected CMSU student will be female?
The probability that a random selected CMSU student will be female is 53.225%
1.3. Assume that the sample is representative of the population of CMSU. Based on the data, answer
the following question:
1.3.1. Find the conditional probability of different majors among the male students in CMSU.
Gender
Female 3 3 7 4 4 3 9 0
Male 4 1 4 2 6 4 5 3
Accounting = (4/29)
= 0.13793103448275862 = 13.793%
CIS = (1/29)
= 0.034482758620689655 = 3.44%
Management = (6/29)
= 0.20689655172413793 – 20.689%
Other = (4/29)
= 0.13793103448275862 = 13.793%
Undecided = (3/29)
= 0.10344827586206896 = 10.344%
PAGE 4
1.3.2 Find the conditional probability of different majors among the female students of CMSU.
Accounting = (3/33)
=0.09090909090909091 = 9.09%
CIS = (3/33)
= 0.09090909090909091 = 9.09%
Economics/Finance =(7/33)
= 0.21212121212121213 = 21.21%
Management = (4/33)
= 0.12121212121212122 = 12.12%
Others = (3/33)
= 0.09090909090909091 = 9.09%
1.4. Assume that the sample is a representative of the population of CMSU. Based on the data, answer the
following question:
1.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.
Prob_Grad Intention_Yes_Male=(17/29)
= 0.5862068965517241 = 58.62%
The probability that a random chosen student is male and intends to graduate is 58.62%
PAGE 5
1.4.2 Find the probability that a randomly selected student is a female and does NOT have a laptop.
= 0.06060606060606061 = 6.06%
1.5. Assume that the sample is representative of the population of CMSU. Based on the data, answer the
following question:
1.5.1. Find the probability that a randomly chosen student is a male or has full-time employment?
1.5.2. Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management.
Probability that a random female student is majoring in International Business and MGT =
((4+4)/33) = 0.24242424242424243 = 24.24%
1.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The Undecided
students are not considered now and the table is a 2x2 table. Do you think the graduate intention and being
female are independent events?
Event A: Probability of the student being female Event B: probability of the student being Female and
a Grad Intent P(A) = 28/40 = 0.70 P(B) = 11/20 = 0.55 P(A)P(B) = 0.700.55 =0.385 The probability of
Grad Intention and being female events are not independent.
1.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending, and
Text Messages.
PAGE 6
Answer the following questions based on the data
1.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3?
No of Students with GPA<3 = 17
1.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.
= 0.4827586206896552 = 48.275%
1.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary, Spending, and
Text Messages. For each of them comment whether they follow a normal distribution. Write a note
summarizing your conclusions.
Salary
From the above plot, we can see that the bell shaped curve is slightly right skewed and the distribution does
not follow a normal distribution.
GPA
PAGE 7
From the above plot, we can see that the bell-shaped curve is slightly left skewed and the distribution does
follow a normal distribution but it does not look very bell shaped.
Salary
From the above plot, we can see that the bell shaped curve is slightly right skewed and the distribution does
not follow a normal distribution.
Text Messages
From the above plot, we can see that the bell shaped curve is slightly right skewed and the distribution does not
follow a normal distribution.
Problem 2-
An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of
moisture the shingles contain when they are packaged. Customers may feel that they have purchased a
product lacking in quality if they find moisture and wet shingles inside the packaging. In some cases,
excessive moisture can cause the granules attached to the shingles for texture and coloring purposes to fall
off the shingles resulting in appearance problems. To monitor the amount of moisture present, the
company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed, and
based on the amount of moisture taken out of the product, the pounds of moisture per 100 square feet is
calculated. The company would like to show that the mean moisture content is less than 0.35 pounds per
100 square feet.
The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shingles and
31 for B shingles.
PAGE 8
2.1 Do you think there is evidence that means moisture contents in both types of shingles are within the
permissible limits? State your conclusions clearly showing all steps.
Null Hypothesis: Moisture content in the ABC asphalt shingles is less than and equal to the permissable limit 0.35
pounds /100sqft
Alternate Hypothesis: Moisture content in the ABC asphalt shingles is greater than and equal to the permissable
limit. Alpha = 0.05 Mean moisture content = 0.35 pounds/100sqft
Variance A = 0.0184
Variance B = 0.0188
N(A) = 36
N(B) = 31
= 1.2889782
P_value = 0.098702819
From the above hypothesis, we can see that the p_value is greater than the level of significance, so we fail to reject
the null hypothesis.
2.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and
conduct the test of the hypothesis. What assumption do you need to check before the test for equality of
means is performed?
step 1 Null hypothesis: Population mean for shingles A & B are equal
Alternate Hypothesis: Population mean for shingles A & B are not equal
Step 5 From the results obtained for the above hypothesis, we fail to reject the null hypothesis as the
p_value is greater than the level of significance. So the polulation means for both Shingles A & shingles B
are equal.
Problem 3A:
Salary is hypothesized to depend on educational qualification and occupation. To understand the dependency,
the salaries of 40 individuals [SalaryData.csv] are collected and each person’s educational qualification and
occupation are noted. Educational qualification is at three levels, High school graduate, Bachelor's, and Doctorate.
PAGE 9
Occupation is at four levels, Administrative and clerical, Sales, Professional or specialty, and Executive or
managerial. A different number of observations are in each level of education – occupation combination.
[Assume that the data follows a normal distribution. In reality, the normality assumption may not always hold if
the sample size is small.]
3A.1 State the null and the alternate hypothesis for conducting one-way ANOVA for both Education and
Occupation individually.
Null Hypothesis H0: The mean salary is the same across all three categories of Education.
Alternate Hypothesis H1: The mean salary is different in atleast one category of Education.
Above is the ANOVA table for Education. From the table, we can see that the Pvalue 1.257709e-08 is much lesser
than our Alpha ie. 0.05. So we can reject our null hypothesis and conclude that there is a significant difference in
atleast one category in Education.
3.A.3 Perform a one-way ANOVA on Salary with respect to Occupation. State whether the null hypothesis
is accepted or rejected based on the ANOVA results.
Null Hypothesis H0: The mean salary is the same across all three categories of Occupation.
Alternate Hypothesis H1: The mean salary is different in atleast one category of Occupation.
Above is the ANOVA table for Occupation. From the table, we can see that the P_value 0.4585 is greater
than the significance level (Alpha)0.05. So, we fail to reject the null hypothesis as there is no evidence to
state that the mean salaries is different in at least one category of Occupation.
3A.5 Perform a two-way ANOVA based on Salary with respect to both Education and Occupation (along with their
interaction Education*Occupation). State the null and alternative hypotheses and state your results. How will you
interpret this result? Explain the business implications of performing ANOVA for this particular case study?
PAGE 10
From the above plot, we see that there is a significant amount of interaction between the variables,
Education and Occupation.
From the ANOVA table, we can see that the p value = 2.232500e-05 is lesser than the significance level
(alpha = 0.05), we reject the null hypothesis. Thus, we see that there is an interaction effect between
education and occupation on the mean salary.
From the ANOVA method and the interaction plot, we see that education combined with occupation
results in higher and better salaries among the people. It is clearly seen that people with education as
Doctorate draw the maximum salaries and people with education HS-grad earn the least. Thus, we can
conclude that Salary is dependent on educational qualifications and occupation.
PAGE 11