You are on page 1of 15

8/7/2021 Statistical

Methods for
Decision Making
Uses data samples of various
domains
Contents

Problem 1. Wholesale Customers Analysis ...................................................................................... 2


Problem 1.1. ........................................................................................................................................ 2
Problem 1.2. ........................................................................................................................................ 2
Problem 1.3. ........................................................................................................................................ 5
Problem 1.4. ........................................................................................................................................ 6
Problem 1.5. ........................................................................................................................................ 6
Problem 2. Clear Mountain State University Analysis ..................................................................... 7
Problem 2.1. ........................................................................................................................................ 7
Problem 2.1.1. ................................................................................................................................. 7
Problem 2.1.2. ................................................................................................................................. 7
Problem 2.1.3. ................................................................................................................................. 7
Problem 2.1.4. ................................................................................................................................. 8
Problem 2.2. ........................................................................................................................................ 8
Problem 2.2.1. ................................................................................................................................. 8
Problem 2.2.2. ................................................................................................................................. 8
Problem 2.3. ........................................................................................................................................ 9
Problem 2.3.1. ................................................................................................................................. 9
Problem 2.3.2. ................................................................................................................................. 9
Problem 2.4. ...................................................................................................................................... 10
Problem 2.4.1. ............................................................................................................................... 10
Problem 2.4.2. ............................................................................................................................... 10
Problem 2.5. ...................................................................................................................................... 10
Problem 2.5.1. ............................................................................................................................... 10
Problem 2.5.2. ............................................................................................................................... 11
Problem 2.6. ...................................................................................................................................... 11
Problem 2.7. ...................................................................................................................................... 11
Problem 2.7.1. ............................................................................................................................... 11
Problem 2.7.2. ............................................................................................................................... 12
Problem 2.8. ...................................................................................................................................... 12
Problem 3. Moisture level of Shingles Analysis ............................................................................. 13
Problem 3.1. ...................................................................................................................................... 13
Problem 3.2. ...................................................................................................................................... 14
Problem 1. Wholesale Customers Analysis
Problem 1.1.
Use methods of descriptive statistics to summarize data. Which Region and which
Channel spent the most? Which Region and which Channel spent the least?

Result of Analysis:

Summary of Wholesale customer data as follows:

Based on the analysis, Region “Other” had spent most ($ 10741625) and Region
“Lisbon” had spent least ($ 2404908). While analysing the spending in channels, amount
spent in Channel “Hotel” is the highest ($ 8070603) and Channel “Retail” spent least
($6645917).

Problem 1.2.
There are 6 different varieties of items that are considered. Describe and
comment/explain all the varieties across Region and Channel? Provide a detailed
justification for your answer.

Result of Analysis:

While analysing the data, it has been found that, in Retail channel, comparatively
more amount spent on items like Grocery, Milk, Detergents paper and Delicatessen in all
regions. On the other hand, in hotel channel, comparatively more amount spent on items
like Fresh, Frozen in all regions. This can be clearly understood with help of following charts
which are drawn based on the given data using python code. Overall spending is high in
Grocery, Fresh etc.
Problem 1.3.
On the basis of a descriptive measure of variability, which item shows the most
inconsistent behaviour? Which items show the least inconsistent behaviour?

Result of Analysis:

Coefficient of Variation is used to indicate the product inconsistency. Among the


given products, Fresh has least Coefficient of variation (1.053) and Delicatessen has highest
Coefficient of variation (1.849). So, the most inconsistent product is Delicatessen and the
least inconsistent product is Fresh.

Just for the analysis purpose, Coefficient of variation of all given products (which
are generated using python commands) are shown here.

Coefficient of Variation of Fresh: 1.053918

Coefficient of Variation of Milk: 1.273299

Coefficient of Variation of Grocery: 1.195174

Coefficient of Variation of Frozen: 1.580332

Coefficient of Variation of Detergents_Paper: 1.654647

Coefficient of Variation of Delicatessen: 1.849407


Problem 1.4.
Are there any outliers in the data? Back up your answer with a suitable
plot/technique with the help of detailed comments.

Result of Analysis: Presence of outliers in each item can be found using boxplot. We
have outliers in the given data. This can be easily understood by using following boxplot.

Problem 1.5.
On the basis of your analysis, what are your recommendations for the business?
How can your analysis help the business to solve its problem? Answer from the business
perspective.

Result of Analysis:

The spending inconsistencies (calculated based on coefficient of variation) should


be minimized. There is significant difference observed in total spending amount in retail
and hotel channel, and this needs to be narrowed down. Focus is needed to increase sales
on products Frozen, Detergents Paper and Delicatessen.
Problem 2. Clear Mountain State University Analysis
Problem 2.1.
For this data, construct the following contingency tables (Keep Gender as row
variable)

Problem 2.1.1.
Gender and Major

Result of Analysis:

contingency table for Gender and Major (Gender as row variable) is as


follows.

Problem 2.1.2.
Gender and Grad Intention

Result of Analysis:

contingency table for Gender and Grad Intention (Gender as row variable) is
as follows.

Problem 2.1.3.
Gender and Employment

Result of Analysis:

contingency table for Gender and Employment (Gender as row variable) is


as follows.
Problem 2.1.4.
Gender and Computer

Result of Analysis:

contingency table for Gender and Computer (Gender as row variable) is as


follows.

Problem 2.2.
Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:

Problem 2.2.1.
What is the probability that a randomly selected CMSU student will be
male?

Result of Analysis:

Total number of students: 62(Male: 29 and Female :33)

Probability of a randomly selected CMSU student will be male is 29/62


which is 46.77%

Problem 2.2.2.
What is the probability that a randomly selected CMSU student will be
female?

Result of Analysis:

Total number of students: 62(Male: 29 and Female :33)

Probability of a randomly selected CMSU student will be female is 33/62


which is 53.23%
Problem 2.3.
Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:

Problem 2.3.1.
Find the conditional probability of different majors among the male students
in CMSU.

Result of Analysis:

Using contingency table of Gender and Major, we can easily find the
conditional probability of different majors and male students in CMSU.

Probability of Male students opting for Accounting = 4/29=13.79%

Probability of Male students opting for CIS = 1/29=3.45%

Probability of Male students opting for Economics/Finance = 4/39=13.79%

Probability of Male students opting for International Business = 2/29=6.90%

Probability of Male students opting for Management = 6/29=20.69%

Probability of Male students opting for Other = 4/39=13.79%

Probability of Male students opting for Retailing/Marketing = 5/29=17.24%

Probability of Male students opting for Undecided = 3/29=10.34%

Based on above probability percentage values, it is clear that most of the


Male students opt for Management, and the second place in opting the majors is
Retail/Marketing. CIS is the least opted major.

Problem 2.3.2.
Find the conditional probability of different majors among the female
students of CMSU.

Result of Analysis:

Using contingency table of Gender and Major, we can easily find the
conditional probability of different majors and male students in CMSU.

Probability of Female Students opting for Accounting = 9.09%

Probability of Female Students opting for CIS = 9.09%

Probability of Female Students opting for Economics/Finance = 21.21%

Probability of Female Students opting for International Business = 12.12%

Probability of Female Students opting for Management = 12.12%

Probability of Female Students opting for Other = 9.09%

Probability of Female Students opting for Retailing/Marketing = 27.27%

Probability of Female Students opting for Undecided = 0.00%


Based on above probability percentage values, it is clear that most of the
Female students opt for Retailing/Marketing, and the second place in opting the
courses is Economics/Finance. Accounting and Other are the least opted courses.
None of the female is undecided about the option of Major.

Problem 2.4.
Assume that the sample is a representative of the population of CMSU. Based on
the data, answer the following question:

Problem 2.4.1.
Find the probability That a randomly chosen student is a male and intends to
graduate.

Result of Analysis:

Using contingency table of Gender and Grad Intention, we can easily find the
probability of Male and intends to Graduate in CMSU.

Probability of Male student and intends to be Graduate = 17/29 which is


58.62%

Problem 2.4.2.
Find the probability that a randomly selected student is a female and does
NOT have a laptop.

Result of Analysis:

Using contingency table of Gender and computer, we can easily find the
probability of female and does not have laptop in CMSU.

Probability of female and does not have laptop = 4/33 which is 12.12%

Problem 2.5.
Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:

Problem 2.5.1.
Find the probability that a randomly chosen student is a male or has full-
time employment?

Result of Analysis:

Using contingency table of Gender and Employment, we can easily find the
probability of Male or full-time employment.

Probability of Male or full-time employment = (29/62)+ (10/62)-(7/62)


=0.5161 which is 51.61%
Problem 2.5.2.
Find the conditional probability that given a female student is randomly
chosen, she is majoring in international business or management.

Result of Analysis:

Using contingency table of Gender and Major, we can easily find the
probability of female and majoring in international business or management.

Probability of female and majoring in international business or


management=8/33 which is 24.24%

Problem 2.6.
Construct a contingency table of Gender and Intent to Graduate at 2 levels
(Yes/No). The Undecided students are not considered now and the table is a 2x2 table. Do
you think the graduate intention and being female are independent events?

Result of Analysis:

The contingency table of Gender and Intent to Graduate at 2 levels (Yes/No) has
been constructed and displayed as follows.

Probability of being Grad intention is Yes =28/40 =0.7

Probability of being Female = 20/40=0.5

Probability of being Female and Grad intention is Yes = 11/40= 0.275

For any two events (A, B) said be independent only if P(A∩B) = P(A) · P(B)

0.275 is not equal to 0.35(0.7 x 0.5).

Since the multiplication value of Probability of being Grad intention is Yes and
Probability of being Female is not equal to Probability of being Female and Grad intention is
Yes, the graduate intention and being female are not independent events.

Problem 2.7.
Note that there are four numerical (continuous) variables in the data set, GPA,
Salary, Spending, and Text Messages. Answer the following questions based on the data

Problem 2.7.1.
If a student is chosen randomly, what is the probability that his/her GPA is
less than 3?

Result of Analysis:
The contingency table of Gender and GPA has been constructed and displayed as
follows.

From above contingency table, the probability of a student (irrespective of Gender)


with GPA less than 3 is 17/62 which 27.41%

Problem 2.7.2.
Find the conditional probability that a randomly selected male earns 50 or
more. Find the conditional probability that a randomly selected female earns 50 or
more.

Result of Analysis:

The contingency table of Gender and Salary has been constructed and displayed as
follows.

The conditional probability that a randomly selected male earns 50 or more is


14/29 which is 48.27%.

The conditional probability that a randomly selected female earns 50 or more is


18/33 which is 54.54%.

Problem 2.8.
Note that there are four numerical (continuous) variables in the data set, GPA,
Salary, Spending, and Text Messages. For each of them comment whether they follow a
normal distribution. Write a note summarizing your conclusions.

Result of Analysis:

The Shapiro-Wilk test is used to determine whether or not a sample comes from a
normal distribution. According to Shapiro–Wilk test results, GAP follow normal distribution,
and Salary, Spending, and Text Messages do not follow normal distribution(p <0.05).
statistics and p values of GPA, Salary, Spending and Text Messages are displayed.

For GPA: Statistics = 0.9685361981391907, p = 0.11204058676958084

For Salary: Statistics = 0.9565856456756592, p = 0.028000956401228905


For Spending: Statistics= 0.8777452111244202, p = 1.6854661225806922e-05

For Test Messages: Statistics = 0.8594191074371338, p = 4.324040673964191e-06

Problem 3. Moisture level of Shingles Analysis


Problem 3.1.
Do you think there is evidence that mean moisture contents in both types of
shingles are within the permissible limits? State your conclusions clearly showing all steps.

Result of Analysis:

H0: The mean moisture content of both types is equal or above permissible limit.
That is H0: μ(A) and μ(B) >= 0.35 pounds per 100 square feet

Ha: The mean moisture content of both types is less than 0.35 pounds per 100
square feet. That is Ha: μ(A) and μ(B) < 0.35 pounds per 100 square feet.

α = 0.05

For Shingles A:

One sample t Test outcome as follows:

One sample t test

t statistic: -1.4735046253382782 p value: 0.07477633144907513

Since p value > 0.05, do not reject H0. There is no enough evidence to reject the
null hypothesis - The mean moisture content of both types is equal or above permissible
limit. That is H0: μ(A) and μ(B) >= 0.35 pounds per 100 square feet.

For Shingles B:

One sample t Test outcome as follows:

One sample t test

t statistic: -3.1003313069986995 p value: 0.0020904774003191826

Since p value < 0.05, do reject H0. There is enough evidence to accept the alternate
hypothesis - The mean moisture content of both types is less than 0.35 pounds per 100
square feet. That is Ha: μ(A) and μ(B) < 0.35 pounds per 100 square feet.
Problem 3.2.
Do you think that the population means for shingles A and B are equal? Form the
hypothesis and conduct the test of the hypothesis. What assumption do you need to check
before the test for equality of means is performed?

Result of Analysis:

Let us consider following.

H0: The mean of two shingles is equal. That is H0: μ(A)= μ(B)

Ha: The mean of two shingles is not equal. That is Ha: μ(A)! = μ(B)

α = 0.05

two sample t-test for equal variance is conducted.

The output is as follows.

Ttest_indResult(statistic=1.2896282719661123, pvalue=0.2017496571835306)

For Python code, please refer the python code file attached.

Since p value is greater than α, we don’t have enough evidence to reject H0. We
accept the null hypothesis - The mean of two shingles is equal. That is H0: μ(A)= μ(B).

The mean of two types of shingles is equal as per the T test result.

You might also like