Professional Documents
Culture Documents
Methods for
Decision Making
Uses data samples of various
domains
Contents
Result of Analysis:
Based on the analysis, Region “Other” had spent most ($ 10741625) and Region
“Lisbon” had spent least ($ 2404908). While analysing the spending in channels, amount
spent in Channel “Hotel” is the highest ($ 8070603) and Channel “Retail” spent least
($6645917).
Problem 1.2.
There are 6 different varieties of items that are considered. Describe and
comment/explain all the varieties across Region and Channel? Provide a detailed
justification for your answer.
Result of Analysis:
While analysing the data, it has been found that, in Retail channel, comparatively
more amount spent on items like Grocery, Milk, Detergents paper and Delicatessen in all
regions. On the other hand, in hotel channel, comparatively more amount spent on items
like Fresh, Frozen in all regions. This can be clearly understood with help of following charts
which are drawn based on the given data using python code. Overall spending is high in
Grocery, Fresh etc.
Problem 1.3.
On the basis of a descriptive measure of variability, which item shows the most
inconsistent behaviour? Which items show the least inconsistent behaviour?
Result of Analysis:
Just for the analysis purpose, Coefficient of variation of all given products (which
are generated using python commands) are shown here.
Result of Analysis: Presence of outliers in each item can be found using boxplot. We
have outliers in the given data. This can be easily understood by using following boxplot.
Problem 1.5.
On the basis of your analysis, what are your recommendations for the business?
How can your analysis help the business to solve its problem? Answer from the business
perspective.
Result of Analysis:
Problem 2.1.1.
Gender and Major
Result of Analysis:
Problem 2.1.2.
Gender and Grad Intention
Result of Analysis:
contingency table for Gender and Grad Intention (Gender as row variable) is
as follows.
Problem 2.1.3.
Gender and Employment
Result of Analysis:
Result of Analysis:
Problem 2.2.
Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:
Problem 2.2.1.
What is the probability that a randomly selected CMSU student will be
male?
Result of Analysis:
Problem 2.2.2.
What is the probability that a randomly selected CMSU student will be
female?
Result of Analysis:
Problem 2.3.1.
Find the conditional probability of different majors among the male students
in CMSU.
Result of Analysis:
Using contingency table of Gender and Major, we can easily find the
conditional probability of different majors and male students in CMSU.
Problem 2.3.2.
Find the conditional probability of different majors among the female
students of CMSU.
Result of Analysis:
Using contingency table of Gender and Major, we can easily find the
conditional probability of different majors and male students in CMSU.
Problem 2.4.
Assume that the sample is a representative of the population of CMSU. Based on
the data, answer the following question:
Problem 2.4.1.
Find the probability That a randomly chosen student is a male and intends to
graduate.
Result of Analysis:
Using contingency table of Gender and Grad Intention, we can easily find the
probability of Male and intends to Graduate in CMSU.
Problem 2.4.2.
Find the probability that a randomly selected student is a female and does
NOT have a laptop.
Result of Analysis:
Using contingency table of Gender and computer, we can easily find the
probability of female and does not have laptop in CMSU.
Probability of female and does not have laptop = 4/33 which is 12.12%
Problem 2.5.
Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:
Problem 2.5.1.
Find the probability that a randomly chosen student is a male or has full-
time employment?
Result of Analysis:
Using contingency table of Gender and Employment, we can easily find the
probability of Male or full-time employment.
Result of Analysis:
Using contingency table of Gender and Major, we can easily find the
probability of female and majoring in international business or management.
Problem 2.6.
Construct a contingency table of Gender and Intent to Graduate at 2 levels
(Yes/No). The Undecided students are not considered now and the table is a 2x2 table. Do
you think the graduate intention and being female are independent events?
Result of Analysis:
The contingency table of Gender and Intent to Graduate at 2 levels (Yes/No) has
been constructed and displayed as follows.
For any two events (A, B) said be independent only if P(A∩B) = P(A) · P(B)
Since the multiplication value of Probability of being Grad intention is Yes and
Probability of being Female is not equal to Probability of being Female and Grad intention is
Yes, the graduate intention and being female are not independent events.
Problem 2.7.
Note that there are four numerical (continuous) variables in the data set, GPA,
Salary, Spending, and Text Messages. Answer the following questions based on the data
Problem 2.7.1.
If a student is chosen randomly, what is the probability that his/her GPA is
less than 3?
Result of Analysis:
The contingency table of Gender and GPA has been constructed and displayed as
follows.
Problem 2.7.2.
Find the conditional probability that a randomly selected male earns 50 or
more. Find the conditional probability that a randomly selected female earns 50 or
more.
Result of Analysis:
The contingency table of Gender and Salary has been constructed and displayed as
follows.
Problem 2.8.
Note that there are four numerical (continuous) variables in the data set, GPA,
Salary, Spending, and Text Messages. For each of them comment whether they follow a
normal distribution. Write a note summarizing your conclusions.
Result of Analysis:
The Shapiro-Wilk test is used to determine whether or not a sample comes from a
normal distribution. According to Shapiro–Wilk test results, GAP follow normal distribution,
and Salary, Spending, and Text Messages do not follow normal distribution(p <0.05).
statistics and p values of GPA, Salary, Spending and Text Messages are displayed.
Result of Analysis:
H0: The mean moisture content of both types is equal or above permissible limit.
That is H0: μ(A) and μ(B) >= 0.35 pounds per 100 square feet
Ha: The mean moisture content of both types is less than 0.35 pounds per 100
square feet. That is Ha: μ(A) and μ(B) < 0.35 pounds per 100 square feet.
α = 0.05
For Shingles A:
Since p value > 0.05, do not reject H0. There is no enough evidence to reject the
null hypothesis - The mean moisture content of both types is equal or above permissible
limit. That is H0: μ(A) and μ(B) >= 0.35 pounds per 100 square feet.
For Shingles B:
Since p value < 0.05, do reject H0. There is enough evidence to accept the alternate
hypothesis - The mean moisture content of both types is less than 0.35 pounds per 100
square feet. That is Ha: μ(A) and μ(B) < 0.35 pounds per 100 square feet.
Problem 3.2.
Do you think that the population means for shingles A and B are equal? Form the
hypothesis and conduct the test of the hypothesis. What assumption do you need to check
before the test for equality of means is performed?
Result of Analysis:
H0: The mean of two shingles is equal. That is H0: μ(A)= μ(B)
Ha: The mean of two shingles is not equal. That is Ha: μ(A)! = μ(B)
α = 0.05
Ttest_indResult(statistic=1.2896282719661123, pvalue=0.2017496571835306)
For Python code, please refer the python code file attached.
Since p value is greater than α, we don’t have enough evidence to reject H0. We
accept the null hypothesis - The mean of two shingles is equal. That is H0: μ(A)= μ(B).
The mean of two types of shingles is equal as per the T test result.