Professional Documents
Culture Documents
REPORT
7/8/2022
DSBA
Dileep Motukuri
1
Contents
INTRODUCTION.................................................................................................................................3
PROBLEM 1.........................................................................................................................................3
QUESTION 1.1.1..................................................................................................................................3
QUESTION 1.1.2..................................................................................................................................4
QUESTION 1.1.3..................................................................................................................................5
QUESTION 1.2.....................................................................................................................................5
QUESTION 1.3.....................................................................................................................................8
QUESTION 1.4.....................................................................................................................................9
QUESTION 1.5.....................................................................................................................................9
PROBLEM 2.......................................................................................................................................10
QUESTION 2.1...................................................................................................................................11
2.1.1. Gender and Major..................................................................................................................11
2.1.2. Gender and Grad Intention.....................................................................................................11
2.1.3. Gender and Employment.......................................................................................................12
2.1.4. Gender and Computer............................................................................................................12
QUESTION 2.2...................................................................................................................................12
2.2.1 What is the probability that a randomly selected CMSU student will be male?..................12
2.2.2 What is the probability that a randomly selected CMSU student will be female?...............12
QUESTION 2.3...................................................................................................................................12
2.3.1 Find the conditional probability of different majors among the male students in CMSU.. .12
2.3.2 Find the conditional probability of different majors among the female students of CMSU.
.....................................................................................................................................................13
QUESTION 2.4...................................................................................................................................13
2.4.1 Find the probability that a randomly chosen student is a male and intends to graduate......13
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a
laptop...........................................................................................................................................14
QUESTION 2.5...................................................................................................................................14
2.5.1 Find the probability that a randomly chosen student is either a male or has a full-time
employment?...............................................................................................................................14
2.5.2 Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management......................................................................14
QUESTION 2.6...................................................................................................................................15
QUESTION 2.7...................................................................................................................................15
2.7.1 If a student is chosen randomly, what is the probability that his/her GPA is less than 3?...15
2
2.7.2 Find conditional probability that a randomly selected male earns 50 or more. Find
conditional probability that a randomly selected female earns 50 or more..................................15
QUESTION 2.8...................................................................................................................................16
PROBLEM 3.......................................................................................................................................18
QUESTION 3.1...................................................................................................................................18
QUESTION 3.2...................................................................................................................................19
INTRODUCTION
The business report is prepared based on the statistical techniques used with python for below three
problems. This report includes explanation of the approach used, insights, inferences, all outputs of
codes like graphs and tables.
Problem 2 – Clear Mountain State University (CMSU) undergraduate student survey data Analysis.
Problem 3 – ABC asphalt shingles Moisture test data analysis & Hypothesis testing.
PROBLEM 1
QUESTION 1.1.1
Data has three regions out of which ‘Other’ region has top frequency of 316 out of 440
purchases.
Data has two Channels out of which ‘Hotel’ channel has top frequency of 298 out of 440
purchases.
Based on mean for 6 varieties in below table Fresh, grocery and milk have the higher average
spending than frozen, Detergent Paper and Delicatessen.
4
QUESTION 1.1.2
Based on below data and 1.1 Graph ‘Other’ region and ‘Hotel’ Channel have spent most.
5
QUESTION 1.1.3
Based on above data and 1.1 Graph ‘Oporto region and ‘Retail Channel have spent less.
QUESTION 1.2
1.2 There are 6 different varieties of items that are considered. Describe and comment/explain
all the varieties across Region and Channel? Provide a detailed justification for your
answer.
Data has three regions out of which ‘Other’ region has top frequency of 316 out of 440
purchases.
Data has two Channels out of which ‘Hotel’ channel has top frequency of 298 out of 440
purchases.
For Fresh Items Average Spending Mean is 12,000 and maximum value being 112,151. When
we compare fresh Items by channel and region wise ‘other’ region have max frequency &
mean for both channels, ‘Lisbon’ region is the least frequency and mean in retail Channel. The
max spending is from ‘Hotel’ chain in ‘Other’ region. Overall Hotel channel has more
spending than Retail for Fresh Items.
6
For Milk Items Average Spending Mean is 5,796 and maximum value being 73,498. When we
compare Milk Items by channel and region wise ‘other’ region have max frequency for both
channels, ‘Oporto region has the least mean in both Channels. The max spending is from
‘Retail’ chain in ‘Other’ region. Overall Retail channel has more spending than Hotel for Milk
Items.
For Grocery Items Average Spending Mean is 7,951 and maximum value being 92,780. When
we compare Grocery Items by channel and region wise ‘other’ region have max frequency for
both channels, ‘Oporto region has the least mean in ‘Hotel’ Channel. The max spending is
from ‘Retail’ chain in ‘Other’ region. Overall Retail channel has more spending than Hotel for
Grocery Items.
For Frozen Items Average Spending Mean is 3,072 and maximum value being 60,869. When
we compare Frozen Items by channel and region wise ‘other’ region have max frequency for
both channels, ‘Oporto’ region has the Max mean in ‘Hotel’ Channel. The max spending is
from ‘Hotel’ chain in ‘Oporto’ region. Overall Hotel channel has more spending than Retail
for Frozen Items.
For Detergents Paper Items Average Spending Mean is 2,881 and maximum value being
40,827. When we compare Detergents Paper Items by channel and region wise ‘other’ region
have max frequency for both channels, ‘Oporto’ region has the Max mean in ‘Retail’ Channel.
The max spending is from ‘Retail’ chain in ‘Other’ region. Overall Retail channel has more
spending than hotel for Detergents Paper Items.
8
For Delicatessen Items Average Spending Mean is 1,524 and maximum value being 47,943.
When we compare Delicatessen Items by channel and region wise ‘other’ region have max
frequency for both channels, ‘Oporto’ region has the less mean in both Channels. The max
spending is from ‘Hotel’ chain in ‘Other’ region. Overall Retail channel has more spending
than hotel for Delicatessen Items.
QUESTION 1.3
1.3 Based on a descriptive measure of variability, which item shows the most inconsistent
behaviour? Which items show the least inconsistent behaviour?
From the below table observed that Delicatessen has the most inconsistent behavior with
coefficient of variation of 1.85 and Fresh has the Least inconsistent behavior with coefficient of
variation of 1.05.
9
QUESTION 1.4
1.4 Are there any outliers in the data? Back up your answer with a suitable plot/technique with
the help of detailed comments.
From the below Figure Boxplot, we can observe that all Items have outliers and especially
Fresh & Grocery have Max.
QUESTION 1.5
1.5 Based on your analysis, what are your recommendations for the business? How can your
analysis help the business to solve its problem? Answer from the business perspective
10
Based on the analysis Hotel have more spending and disturber can concentrate more on Hotel
channel to improve more the sales.
From the analysis we can conclude that the ‘Other’ region has the highest spending, so
distributor should increase the good distribution channel and focus more on the other region to
improve sales.
From the analysis observed that spending on Fresh, Grocery & Milk items is maximum
compared to Other Three items. Delicatessen and Detergents paper are having the least
spending. Distributor Should target more on the sales of Fresh, grocery and milk product to
improve the Business.
PROBLEM 2
The Student News Service at Clear Mountain State University (CMSU) has decided to gather data
about the undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14
questions and receives responses from 62 undergraduates (stored in the Survey data set).
11
QUESTION 2.1
2.1 For this data, construct the following contingency tables (Keep Gender as row variable).
Below the contingency table across the Gender and Graduation Intention.
QUESTION 2.2
Assume that the sample is a representative of the population of CMSU. Based on the data,
answer the following question:
2.2.1 What is the probability that a randomly selected CMSU student will be male?
Probability of selecting male student = (Total no. of male students) / (Total no. of students)
=29/62=0.4677
2.2.2 What is the probability that a randomly selected CMSU student will be female?
Probability of selecting Female student = (Total no. of female students) / (Total no. of students)
=33/62=0.5323
QUESTION 2.3
2.3.1 Find the conditional probability of different majors among the male students in CMSU.
13
The conditional probability of accounting major among the male students is 0.137.
The conditional probability of CIS major among the male students is 0.034.
The conditional probability of Economics/Finance major among the male students is 0.137.
The conditional probability of International Business major among the male students is 0.068.
The conditional probability of Management major among the male students is 0.206.
The conditional probability of Other major among the male students is 0.137.
The conditional probability of Retailing/Marketing major among the male students is 0.172.
The conditional probability of Undecided major among the male students is 0.103.
2.3.2 Find the conditional probability of different majors among the female students of
CMSU.
The conditional probability of accounting major among the female students is 0.09.
The conditional probability of CIS major among the female students is 0.09.
The conditional probability of Economics/Finance major among the female students is 0.212.
The conditional probability of International Business major among the female students is 0.121.
The conditional probability of Management major among the female students is 0.121.
The conditional probability of Other major among the female students is 0.09.
The conditional probability of Retailing/Marketing major among the female students is 0.272.
The conditional probability of Undecided major among the female students is 0.
QUESTION 2.4
2.4.1 Find the probability that a randomly chosen student is a male and intends to graduate.
The probability of selected student being a male and intends to graduate is (17/29) * (29/62) = 0.2741
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a
laptop.
The probability of selected student being a female and without laptop is ((33-29)/33) * (33/62) = 0.0645
QUESTION 2.5
2.5.1 Find the probability that a randomly chosen student is either a male or has a full-time
employment?
The probability that a randomly selected student is either male or has full-time employment = (29/62)
+ (10/62) - (7/62) = 0.5161
2.5.2 Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management.
The probability that a randomly selected student is either female or has major in international business
or management = (4/33) + (4/33) = 0.2424
15
QUESTION 2.6
2.6 Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The
Undecided students are not considered now, and the table is a 2x2 table. Do you think graduate
intention and being female are independent events?
If the two events are independent the following condition must be satisfied the probability
multiplication of both events is equal to the probability of combined event.
As the probability multiplication of both events is not equal to the probability of combined
event, so being a female and graduate intention are not independent events.
QUESTION 2.7
2.7 Note that there are four numerical (continuous) variables in the data set, GPA,
Salary, Spending and Text Messages. Answer the following questions based on the data
2.7.1 If a student is chosen randomly, what is the probability that his/her GPA is less than 3?
2.7.2 Find conditional probability that a randomly selected male earns 50 or more. Find
conditional probability that a randomly selected female earns 50 or more.
16
Probability of the male student earning more than or equal to 50 = ((14/32) * (32/62)) /
(29/62) = 0.4827
Probability of the female student earning more than or equal to 50 = ((18/32) * (32/62)) /
(33/62) = 0.5454
QUESTION 2.8
2.8.1 Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending and Text Messages. For each of them comment whether they follow a normal
distribution.
2.8.2 Write a note summarizing your conclusions
Based on the below data apart from GPA & Salary remaining both Spending & Text messages have
Skewness of more than 1. Based on the below data we can consider only GPA is Normally
distributed.
GPA has a near bell shaped curve. The mean, median and mode are nearly equal. Hence there exists a
slight skewness. The distribution is a u-shaped with skewness of -0.31. based on below graphs GPA
looks like Normally distributed.
17
Salary has near bell shaped curve. The mean, median and mode are not equal. Hence there exists a
skewness. The distribution can be considered as u-shaped with skewness of 0.53. based on below
graphs Salary is not Normally distributed
Spending has near bell shaped curve. The mean, median and mode are not equal. Hence there exists a
skewness. The distribution can be considered as u-shaped with skewness of 1.58. based on below
graphs Salary is not Normally distributed
Text messages has near bell shaped curve. The mean, median and mode are not equal. Hence there
exists a skewness. The distribution can be considered as u-shaped with skewness of 1.29. based on
below graphs Salary is not Normally distributed
18
PROBLEM 3
An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount
of moisture the shingles contain when they are packaged. Customers may feel that they have
purchased a product lacking in quality if they find moisture and wet shingles inside the packaging. In
some cases, excessive moisture can cause the granules attached to the shingles for texture and
coloring purposes to fall off the shingles resulting in appearance problems. To monitor the amount of
moisture present, the company conducts moisture tests. A shingle is weighed and then dried. The
shingle is then reweighed, and based on the amount of moisture taken out of the product, the pounds
of moisture per 100 square feet are calculated. The company would like to show that the mean
moisture content is less than 0.35 pounds per 100 square feet.
The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A
shingles and 31 for B shingles.
QUESTION 3.1
3.1 Do you think there is evidence that means moisture contents in both types of shingles are
within the permissible limits? State your conclusions clearly showing all steps.
So, the statistical decision is failing to reject the null hypothesis at 5% level of significance.
Conclusion – with a 95% confidence level, there is no enough evidence to conclude that the mean
moisture content for Sample A shingles is greater than 0.35 pounds per 100 square feet.
19
So, the statistical decision is to reject the null hypothesis at 5% level of significance.
Conclusion – with a 95% confidence level, there is enough evidence to conclude that the mean
moisture content for Sample B shingles is greater than 0.35 pounds per 100 square feet.
QUESTION 3.2
Do you think that the population mean for shingles A and B are equal? Form the hypothesis
and conduct the test of the hypothesis. What assumption do you need to check before the test for
equality of means is performed?
Null hypothesis states that Mean moisture contents are the same, 𝜇𝐴 equals 𝜇𝐵 . 𝐻0 : 𝜇𝐴 = 𝜇𝐵
Alternative hypothesis states that mean moisture contents are not same, 𝜇𝐴 is not equal to 𝜇𝐵 . 𝐻𝐴 :
𝜇𝐴 ≠ 𝜇𝐵
Here we select 𝛼 = 0.05 and the population standard deviation is not known.
The sample is not a large sample, n is near to 30. So we use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test
statistic for two sample unpaired test.
Conclusion - with a 95% confidence level, we conclude that the mean moisture content in both the
shingles are same.