Professional Documents
Culture Documents
Lesson 4—Analyze
The data obtained from measurement phase exhibits variety of distribution, depending on the data
type and its source.
The methods used to describe the parameters for classes of distribution are:
Characteristics
of Binomial
Distribution Describes the discrete data as a
Predicts sample behavior
result of a particular process
_
P R = n C r ∗ pr ∗ 1 − p n r
where, P(R) = probability of exactly (r) successes out of a sample size of (n)
p = probability of success; r = number of successes desired; n = sample size
Term Formula
𝜇 = 𝑛𝑝
Mean where, n = sample size
p = probability of success
𝜎 = 𝑛𝑝(1 − 𝑝)
Standard Deviation where, n = sample size
p = probability of success
5! = 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1 = 120
Sample factorial calculation
4! = 4 ∗ 3 ∗ 2 ∗ 1 = 24
Q Using binomial distribution formula, find the probability of getting 5 heads in 8 coin tosses.
Poisson distribution is an application of the population knowledge to predict the sample behavior.
Characteristics
of Poisson
Distribution
Describes the discrete Deals with integers which
data can take any value
λx ∗ e−λ
P ∗ =
∗!
where, P(x) = probability of exactly (∗) occurrences in a Poisson distribution (n)
λ = mean number of occurrences during interval
∗ = number of occurrences desired
e = base of the natural logarithm (equals 2.71828)
Q The past records of a road junction which is accident-prone show that the mean number of accidents every
week is 5 at this junction. Assume that the number of accidents follows a Poisson distribution and calculate
the probability of any number of accidents happening in a week.
51 ∗ e−5
Probability of exactly one accident per week P 1 = = 0.03
1!
● A variable is said to be continuous if the range of possible values falls along a continuum.
Example: Loudness of cheering at a ball game, weight of cookies in a package, length of a pen,
or the time required to assemble a car.
(Y − µ)
Z=
σ
where, Z = number of standard deviations between Y and the µ
Y = value of the data point in concern
µ = mean of the population
σ = standard deviation of the population
Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean value of
250 hours and standard deviation value of 23 hrs. What is the probability that a problem resolution will take
more than 300 hrs?
A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = = 2.17
23
● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%
The probability of areas under the curve is 1. For the actual value, one can identify the Z score by
using the Z-table.
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
This is the most commonly 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
used normal distribution Z- 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
table with the positive Z- 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
scores. 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 08869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
A There is no need of the table to find the answer once you know that the variable Z takes a value of
less than (or equal to) zero.
● First, the area under the curve is 1, and second, the curve is symmetrical about Z = 0.
● Hence, there is 0.5 (or 50%) above chance of Z = 0 and 0.5 (or 50%) below chance of Z = 0.
Chi-square distribution (chi-squared or χ² distribution) with k-1 degrees of freedom is the distribution
of the sum of the squares of k independent standard normal random variables.
Characteristics
of χ²
Distribution
fO − fe 2
χ2calculated = =
fe
where, χ2calculated () = chi-square index
fO = observed frequency
fe = expected frequency
! Chi-square distribution will be covered in detail in the later part of this lesson.
A t-distribution is most
appropriate to be used when:
● the sample size <30;
● population standard
deviation is not known; and
● population is approximately
normal.
The F-distribution is a ratio of two Chi-square distributions, and a specific F-distribution is denoted by
the degrees of freedom for the numerator Chi-square and the degrees of freedom for the
denominator Chi-square.
S12
Fcalculated = 2
S2
where, S1 and S2 = standard deviations of the two samples
● If Fcalculated is 1, there is no difference in the variance
● If S1 > S2 , then the numerator should be greater than denominator (df1 = n1 – 1 and df2 = n2 – 1)
Refer F-table to find out critical F-distribution at α and degrees of freedom of samples of two
! different processes (df1 and df2)
Copyright 2014, Simplilearn, All rights reserved. 25
Analyze
Topic 2—Exploratory Data Analysis
Multi-Vari studies analyze variation, investigate process stability, identify investigation areas, and
break down the variation.
They classify variation sources into three major types:
Variations within a single unit Variations among sequential Variations which occur over
where variation is due to location. repetitions over a short time. longer periods of time.
Examples: Pallet stacking in a Examples: Every n’th pallet Examples: Process drift,
truck, temperature gradient in an broken, batch-to-batch variation, performance before and after
oven, variation observed from lot-to-lot variation, invoices breaks, seasonal and shift based
cavity-to-cavity within a mold, received day-to-day, and account differences, month-to-month
region of a country, line on invoice activity week-to-week closings, and quarterly returns
Example: Select Example: Sample Example: The Example: Chart is Example: The
the process size is five pieces tabulation sheet plotted with time observed values
where the plate from each with data records on X axis and the are linked by
is being equipment and contains the plate thickness appropriate lines.
manufactured the frequency of columns with on Y axis.
and measure its data collection is time, equipment
thickness within every two hours. number, and
a specified range. thickness as
headers.
Correlation is the association between variables. The Coefficient Correlation shows the strength of
the relationship between Y and X.
The statistical significance is denoted by correlation coefficient ‘r’.
-1 0 +1
Movement in both No correlation between Movement in both
variables is inverse the two variables variables is same
Higher the absolute value of ‘r’, stronger the correlation between Y and X.
! An ‘r’ value of > + 0.85 or < - 0.85 indicates a strong correlation.
! Simple Linear Regression is for one X and Multiple Linear Regression is for more than one X.
Y = f(X) may not be the correct transfer function to It is important to discover whether a statistical
control Y because there may be a low level of significant relationship exists between Y and a
correlation between the two variables. particular X by looking at p-values. Based on
regression, one can infer the vital X and eliminate the
rest.
It is important to understand if there is statistical relevance between Y and X using the metrics from
! Regression Analysis. The Simple Linear Regression should be used as a Statistical Validation tool.
Copyright 2014, Simplilearn, All rights reserved. 33
Simple Linear Regression (SLR)
A simple linear regression equation is a fitted linear equation between Y and X. It is represented as
follows:
Y = A + BX ± C
where,
● Y = Dependent variable / output / response
● X = Independent variable / input / predictor
● A = Intercept of fitted line on Y axis
● B = Regression coefficient / Slope of the fitted line
● C = Error in the model
If Y and X are not perfectly linear (r = ± 1), several lines could fit in the scatter plot. It can be inferred
from the graphs below:
● Minitab fits the line which has the least Sum of Squares of Error.
● In a linear relationship, the points would lie on the line. Typically, the data lies off the line.
● The distance from the point to line is the error distance used in the SSE calculations.
A farmer wishes to predict the relationship between the amount spent on fertilizers and the annual
sales of his crops. He collects the following data of last few years and determines his expected
revenue if he spends $8 annually on fertilizer.
To use the data for Regression analysis, the interpretation of the scatter chart is as follows:
● The r2 value (Coefficient of Determination) conveys if the model is good and can be used. The r2
value is 0.3797.
● 38% of variability in Y is explained by X.
● The remaining 62% variation is due to residual factors.
● The low value of r2 validates a poor relationship between Y and X.
! Refer to the Cause and Effect Matrix and study the relationship between Y and a different X variable.
If a new variable X2 is added to the r2 model, the impact of X1 and X2 on Y gets tested. This is known as
Multiple Linear Regression. In Multiple Linear Regression:
● the value of r2 changes due to the introduction of the new variable.
● the resulting value of r2 is known as ‘r2 Adjusted.’
● the model can be used if ‘r2 Adjusted’ value is greater than 70%.
A regression equation may denote a relationship between variables. It does not indicate:
● if change in one variable causes change in the other; and
● both the variables may be dependent on another independent variable.
There is a positive
correlation between the
number of sneezes and the
deaths in the city. It cannot
be assumed that sneezing is
the cause of death though
the correlation is very strong.
The differences between a variable and its hypothesized value may be statistically significant but may
not be practical or economically meaningful.
For example: Based on the hypothesis test, Nutri Worldwide Inc. implemented a trading strategy. The
returns:
● are economically significant when logical reasons are examined before implementation.
● may not be significant when statistically proven strategy is implemented directly.
● may be economically insignificant due to taxes, transaction costs, and risks.
The conceptual differences between a null and an alternate hypothesis are as follows:
● Represented as H0 ● Represented as Ha
● Cannot be proved, only rejected ● Challenges the null hypothesis
● Example: Movie is good ● Example: Movie is not good
The conceptual differences between type I and type II error are as follows:
● Rejecting a null hypothesis when it is true ● Accepting a null hypothesis when it is false
● Also known as Producer’s Risk ● Also known as Consumer’s Risk
● ‘α’ is the chance of committing a Type 1 error ● ‘β’ is the chance of committing a Type II Error
● The value of ‘α’ is 0.05 or 5% ● The value of ‘β’ is 0.2 or 20%
● Example: When a movie is good, it is reviewed as ● Any experiment should have as less β value as
‘not good.’ possible
● Example: When a movie is not good, it is
reviewed as ‘good.’
While dealing with type I or type II errors, following are the points to remember:
● Probability of making one type of error can be reduced, leading to increasing the probability of
making the other type of error.
● If a true null hypothesis is erroneously rejected (Type I error), a false null hypothesis may be
accepted (Type II error).
● ‘α’ is set at 0.05, which means the risk of committing a type I error will be 1 out of 20 experiments.
● It is important to decide what type of error should be less and set ‘α’ and ‘β’ accordingly.
! In hypothesis testing, ‘α’ is the significance level and ‘1-α’ is the confidence level.
The sample size for continuous data can be determined by the formula:
𝛼 2
Z1−( )∗ σ 𝛼 ∗
n= 2
1- ( 2 ) = 0.975
∆
To calculate the standard sample size for continuous data, the value of α is taken as 5%. According to Z
table, the Z97.5 = 1.96. The standardized sample size formula is:
1.96 ∗ σ 2
n= for Continuous Data
∆
Q The population standard deviation for the time, to resolve customer problems, is 30 hours. What should
be the size of a sample that can estimate the average problem resolution time within ± 5 hours tolerance
with 99% confidence?
To calculate the standard sample size for discrete data, if the average population proportion non-
defective is ‘p’, then population standard deviation can be calculated as:
1.96 2
σ= p(1 − p) n= p(1 − p) for Discrete Data
∆
Where ∆ = Tolerance allowed on either side of the population proportion average in %
Q The non-defective population proportion for pen manufacturing is 80%. What should be the sample size to
draw a sample that can estimate the proportion of compliant pens within ± 5% with an alpha of 5%?
The figure below helps in concluding the type of test one should perform based on the kind of data
and values available:
Hypothesis testing
σ known Z-test
In hypothesis test for variance Chi square test is used. This is explained in the example below:
H0: Proportion of wins in Australia or abroad is independent of the country played against
Ha: Proportion of wins in Australia or abroad is dependent on the country played against
χ2 Critical = 6.251 and
χ2 Calculated = 1.36
Result: Since calculated value is less than the critical value, the proportion of wins of Australia hockey
team is independent of the country played or place.
Result: It can be concluded that the proportion of smokers in R is greater than 0.10.
Q Susan is examining the earnings of two companies. According to her, the earnings of Company A are more
volatile than those of Company B. She has been obtaining earnings data for the past 31 years for Company
A, and for the past 41 years for Company B. She finds that the sample standard deviation of Company A’s
earnings is $4.40 and of Company B’s earnings is $3.90. Determine whether the earnings of Company A
have a greater standard deviation than those of Company B at 5% level of significance.
A H0 : σA2= σB2 = the variance of Company A’s earnings is equal to the variance of Company B’s earnings.
Ha : σA2 < > σB2 = the variance of Company A’s earnings is different.
σA2= variance of Company A’s earnings.
σB2= variance of Company B’s earnings.
Note: σA > σB. In calculating the F-test statistic, always put the greater variance in the numerator.
The critical value from F-table equals 1.74. The null hypothesis is rejected if the F-test statistic is
greater than 1.74.
Results: The F-test statistic (1.273) is not greater than the critical value (1.74). Therefore, at 5%
significance level, the null hypothesis cannot be rejected.
A restaurant wanting to explore the recent overuse of avocados suspects there is a difference
between two chefs and number of avocados used to prepare the salads. The table shows the measure
of avocados in ounces.
The interpretations for the conducted F-test are as F-Test Two-Sample for Variances
The table shows the measure of avocados in ounces. If a significant difference in their means is found,
it can be concluded that there is a possibility of Special Cause of Variation.
The steps for conducting 2-sample t-test in MS-Excel are given below:
1 2 3
Select 2-Sample
Open MS Excel, In Variable 1 range,
Independent t-test
click Data and click select the data set
assuming unequal
Data Analysis. for Group A.
variances.
4 5 6
Keep the
In Variable 2 range,
“Hypothesized
select the data set Click Ok.
Mean Difference”
for Group B.
as 0.
The alternate hypothesis tests two conditions, Mean of A < Mean of B and Mean of A > Mean of B. Thus a
! two-tailed probability needs to be used.
The difference between the usage of the 2-tailed probability and one-tailed probability are as follows:
● If the alternate hypothesis tests more than ● If the alternate hypothesis tests one
one direction, either less or more, use a 2- direction, use a 1-tailed probability value
tailed probability value from the test. from the test.
Example: If Mean of A is not equal to Mean of B, Example: If Mean of A is greater than Mean of B,
then it is 2-tailed probability. then it is 1-tailed probability.
For example, a group of students score X in CSSGB before taking the Training program. Post the training
program, the scores are taken again.
● One needs to find out if there is a statistical difference between the two sets of scores.
● If there is a significant difference, the inference could be that the training was effective.
Sample Variance (S2) is the average of the squared differences from the mean.
● It is used to calculate and understand the degree of variation of a sample.
● In statistics, its value is used by converting it into standard deviation and combining with the
mean.
Consider the sample of weights. Suppose the mean value is 140 and when you subtract each value
from the mean, take the square value of the result, and then take the average of the squared
difference, the resulting sample variance value is 1936.
● In order to get the standard deviation, take the square root of the sample variance: √1936 = 44.
● The standard deviation along with the mean, will tell you how much the majority of the people
weigh.
o The mean value is 140 and variance is 44, the majority of people weigh between 96 pounds
(140 - 44) and 184 pounds (140 + 44).
ANOVA:
● is used to compare the means of more than two samples;
● stands for Analysis of Variance;
● helps in understanding that all sample means are not equal;
● based shortlisted samples can further be tested; and
● generalizes the t-test to include more than two samples.
The table shows the takeaway food delivery time of Outlet 1 Outlet 2 Outlet 3
three different outlets. To benchmark the delivery 48 50 49
time of the outlets: 49 48 48
The following output is received when the data is fed into the Minitab:
To perform ANOVA, enter the data on an Excel spreadsheet, select the ANOVA-single factor test from
the Data Analysis “Toolpak,” and select the array for analysis and an output range.
! In one-way ANOVA, one factor has to be benchmarked unlike the two-way ANOVA.
2
f0 −fe
𝒳2Calculated = Σ fe
Where,
• 𝒳2Calculated = chi-square index
• Fo = An observed frequency
• Fe = An expected frequency
total)/overall total.
There is a different chi-square distribution for each different number of degrees of freedom. For chi-
square distribution, degrees of freedom are calculated as per the number of rows and columns in the
contingency table.
Mann-Whitney or Wilcoxon Rank Sum test is a non-parametric test used to compare two unpaired
groups. In this test:
● The rejection and acceptance condition remains the same for different cases:
The aim of this test is to rank the entire data available for each condition and then compare the total
! outcome of the two ranks.
● The smallest number gets a ● Continue till all the whole- ● Summate the ranks for the
rank of 1. number ranks are used. observations from sample 1
● The largest number gets a and then summate the rank
rank of n, where n is the total in sample 2 (larger group).
number of values in the two
groups.
Group Data Sorted Data Group Rank A Final Rank G1 Rank G2 Rank
(R1) (R2)
14 2 G1 1 1.5
Avg. = 1.5 1.5 1.5
2 2 G2 2 1.5
4 3
G1 5 4 G2 3 3
6 5
16 5 G1 4 4
7.5 7.5
9 8 G2 5 5
9 10
4 9 G1 6 6
Total = 28 Total = 27
2 14 G1 7 Avg. = 7.5 7.5
n1 = 5 n2 = 5
G2 18 14 G2 8 7.5
14 16 G1 9 9
8 18 G2 10 10
The formula for the Mann-Whitney U test for n1 and n2 values is:
[n1(n1 + 1)]
U1 = n1 × n2 +
2 −R1
[n2(n2 + 1)]
U2 = n1 × n2 +
2 −R2
In this example,
U1 = 12 and U2 = 13
● To be statistically significant, the obtained U value must be equal to or less than this critical value.
! ● Since the calculated U value is 12 (not less than 2), there is no statistical difference between the mean of
the two groups.
The Kruskal-Wallis test is also a non-parametric test used for testing the source of origin of the
samples.
● Medians of two or more samples are compared to find the source of origin of the sample.
● Unlike the analogous one-way analysis of variance, it does not assume the normal distribution of
the residuals.
● Null hypothesis is when medians of all the groups are equal, and
! ● Alternative hypothesis is when at least one population median of one group is different than that of at
least one other group.
The Mood’s median is a non-parametric test that is used to test the equality of medians from two or
more different populations. This test works when:
Friedman test is a form of non-parametric test that does not make any assumptions on the shape and
origin of the sample.
● It allows smaller sample data sets to be analysed, and
● Unlike ANOVA, it does not require the dataset to be randomly sampled from normally distributed
populations with equal variances.
The test uses null hypothesis where the population medians of each treatment are statistically identical to
! the rest of the group.
The 1 Sample Sign test is the simplest of all the non-parametric tests that can be used instead of a
one sample t test.
● Here, H0 is the hypothecated median or assumed median of the sample, which belongs to the
population.
Values that are larger than Values that are smaller than Check if there are significantly
hypothesized median the hypothesized median more positives (or negatives)
than expected
The 1 Sample Wilcoxon test also known as Wilcoxon Signed Rank test is a non-parametric test.
This test is:
● equivalent to parametric One Sample t-Test, and
● powerful than non-parametric 1 Sample Sign Test.
The conclusion in this test is that if the value is on the mid-point, you can continue and accept the null
! hypothesis. If not, reject the alternate hypothesis.
The Median customer satisfaction score of an organization has always been 3.7 and the management
wants to see if this has changed. They conducted a survey and got the results grouped by the
customer type.
Conclusion:
● If median = 3.7 = Accept H0
● If median ≠ 3.7 = Reject Ha
● α = 0.05
a. Statistics
b. Inferential Statistics
c. Probability
d. Correlation
a. Statistics
b. Inferential Statistics
c. Probability
d. Correlation
Answer: b.
Explanation: Inferential statistics describe the population parameters based on the sample
data using a particular model.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 96
QUIZ Which of the following is an application of the population knowledge to predict the
2 sample behavior?
a. Poisson distribution
b. Normal distribution
c. Chi-square distribution
d. Probability distribution
a. Poisson distribution
b. Normal distribution
c. Chi-square distribution
d. Probability distribution
Answer: a.
Explanation: Poisson distribution is an application of the population knowledge to predict
the sample behavior.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 98
QUIZ Which of the following is used to calculate the degree of movement of variable Y as X
3 changes?
a. Correlation
b. Probability
c. F-distribution
d. Regression
a. Correlation
b. Probability
c. F-distribution
d. Regression
Answer: d.
Explanation: The degree of movement of variable Y as X changes is calculated using
regression.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 100
QUIZ A null hypothesis states that a process has not improved as a result of some
4 modifications. The type II error is to conclude that:
Answer: b.
Explanation: A type II error means that we have failed to reject the null hypothesis (H0)
when it is false.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 102
QUIZ
The test used for testing significance in an analysis of variance table is the:
5
a. Z-test.
b. t-test.
c. F-test.
d. Chi-square test.
a. Z-test.
b. t-test.
c. F-test.
d. Chi-square test.
Answer: c.
Explanation: The appropriate ANOVA test is the F-test. ANOVA is a test of the equality of
means.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 104
QUIZ
Which of the following is the only way to analyze the variance by ranks?
6
d. Kruskal-Wallis test
d. Kruskal-Wallis test
Answer: d.
Explanation: The Kruskal-Wallis test is the only way to analyze the variance by ranks.
a. Chi-square distribution
b. Normal distribution
c. t-distribution
d. F-distribution
a. Chi-square distribution
b. Normal distribution
c. t-distribution
d. F-distribution
Answer: a.
Explanation: The chi-square distribution is used to compare a sample variance with a known
population variance.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 108
QUIZ
If p-value is less than the significant value, the null hypothesis has to be:
8
a. rejected.
b. accepted.
c. maintained as it is.
d. re-evaluated.
a. rejected.
b. accepted.
c. maintained as it is.
d. re-evaluated.
Answer: a.
Explanation: If the p-value is less than the significant value, the null hypothesis has to be
rejected as the data is not supporting the null hypothesis and the difference will be
statistically significant.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 110
QUIZ Which of the following is a nonparametric test that is used to test the equality of
9 medians from two or more different populations?
b. Kruskal-Wallis test
c. Friedman test
b. Kruskal-Wallis test
c. Friedman test
Answer: a.
Explanation: The Mood’s median is a nonparametric test that is used to test the equality of
medians from two or more different populations.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 112
QUIZ
Which of the following is a ratio of two chi-square distributions?
10
a. F-distribution
b. t-distribution
c. Poisson distribution
d. Binomial distribution
a. F-distribution
b. t-distribution
c. Poisson distribution
d. Binomial distribution
Answer: a.
Explanation: The F-distribution is a ratio of two chi-square distributions.
b. Power of a test
c. Simple linear regression
b. Power of a test
c. Simple linear regression
Answer: b.
Explanation: The power of a test is the probability of correctly rejecting the null hypothesis
when it is false.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 116
QUIZ Which of the following assumes that the existing sample is randomly taken from a
12 population, with a symmetric frequency distribution around the median?
a. Kruskal-Wallis test
d. Friedman test
a. Kruskal-Wallis test
d. Friedman test
Answer: c.
Explanation: 1 Sample Wilcoxon test assumes that the existing sample is randomly taken
from a population, with a symmetric frequency distribution around the median.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved. 118
Summary
Here is a quick ● The Mann-Whitney or Wilcoxon Rank Sum test is used to compare two
recap of what we
unpaired groups.
have learned in this
lesson: ● The Kruskal–Wallis Test is used for testing the source of origin of samples.
● The Mood’s median test is used to test the equality of medians from two
or more different populations.
● The Friedman test does not make any assumptions on the shape and
origin of the sample.
● The 1 Sample Sign test is the simplest of all the non-parametric tests that
can be used in the place of a 1 sample t-test.