You are on page 1of 71

STATISTICAL

METHODS IN
Nina Cortez
Jv Gepulgani QUALITY
Kurt Manalac
ndrey Pagsolingan
Arvin Perez
MANAGEMENT
Lheam Vilog
1
S
• Is a science concerned with “the
collection, organization, analysis,
interpretation, and presentation of
data”.
• It is essential for quality and for
implementing a continuous
improvement philosophy.
• Statistical methods helps
managers make sense of data and
gain insight about the nature of
variation in the processes they 2
PROBABILITY
DISTRIBUTION
BASIC PROBABILITY
CONCEPT
• To apply statistics, you need to have a
basic understanding of probability and
probability distribution.
• Statistical terminology, an experiment
is a process that result in some outcome.
Outcome is the result we observe.
Sample space is the collection of all
possible outcomes of an experiment.

4
PROBABILITY
• The likelihood that an outcome occurs.
• The probability associated with any outcome must
be between 0 and 1 or 0 < P(0i) < 1
• The sum of the probabilities over all possible
outcomes must be 1.0 or 1
• Event is a collection of one or more outcomes
from a sample space,

5
 Complement of A, denoted as A^c, consist of all
outcomes in the sample space not in A. for example,
if A is the event of finding 2 or fewer defectives in a
sample of 10, then A^c is the event of finding 3 or
more defectives.
 Two events are mutually exclusive if they have no
outcomes in common. For example, if A is the event
“2 or fewer defects in a sample” and B is the event
“5 or more defects,” then clearly A and B are
mutually exclusive.

6
RULES APPLY TO CALCULATING THE
PROBAILITIES OF EVENTS:
 Rule 1: the probability of any event is the sum of the
probabilities of the outcomes that compose that event.
 Rule 2: the probability of the complement of any event. A
is P(A^c) = 1 - P(A).
 Rule 3: if events A and B are mutually exclusive, then P(A
and B) = P(A) + P(B).
 Rule 4: if two events A and B are not mutually exclusive,
then P(A and B) = P (A) + P(B) – P(A and B)

7
EXAMPLE USING THE
PROBABILITY RULES
In testing a new personal computer after assembly, a company
discovered that among a sample of 100 units, 3 failed to boot
up properly because of a defect in the motherboard, 4 units
had a hard drive failure , and 2 units experienced both failures.
Let A be the event “failure to boot” and B be the event “hard
drive failure.” Then P(A) = 3/100 and P(B) = 4/100. However,
these events are not mutually exclusive because both A and B
occurred together; specifically, P(A and B) = 2/100. Therefore,
the probability that one or the other failure occurred is P(A and
B) = P(A) + P(B) – P(A and B) = 3/100 + 4/100 - 2/100 =
5/100.
8
CONDITIONAL
PROBABILITY
 Is the probability of occurrence of one event A, given
that another event B is known to be true or have
already occurred. P(A|B) = P(A and B) /P(B)
 For example, multiplying both side of formula (6.1) by
P(B), we obtain P(A and B) = P(A |B) P(B). note that
we may switch the roles of A and B and write P9B and
A = P(B|A) P(A). But P(B and A) is the same as P(A and
B in two way: P(A and B) = P(A |B) P(B) = = P(B|A)
P(A) this is often called Multiplication rule of
probability
9
 
PROBABILITY
DISTRIBUTION
 Random variable is a numerical description of an
experiment. For example, an experiment consist or
sampling 10 parts and counting the number of
defectives. We might define the random variable X to
be the number of defective parts in the sample. We
might define a random variable Y to be 1 if the
outcomes is pass, and 0 if the outcomes is fail. A
random variable can be either discrete or continuous,
depending on the specific numerical values it may
assume.
10
Is a characterization of the possible values that a
random variable may assume along with the
probability of assuming these values. For a random
variable X, the probability distribution of the of X is
denoted by a mathematical function f(x). the symbol Xi
represents the i^th value of the random variable or X
and f(Xi) its probability. The cumulative distribution
function, F(X), specifies the probability that the
random variable X will assume a value less than or
equal to a specified value, x. this is also denoted as
P(X < x), and read as “ the probability that the random
variable X is less than or equal to x.”
11
BINOMIAL
DISTRIBUTION
Describes the probability of obtaining exactly x
"successes" in a sequence of n identical experiments,
called trials. A success can be any one of two possible
outcomes of each experiment. In some situations, it might
represent a defective item, in others, a good item. The
probability of success in each trial is a constant value p.

F(x) = (n/x)p^x(1-p)^n-x
= n! /x! (n - x)! P^x(1-p) ^n-x

12
USING THE BINOMIAL
DISTRIBUTION
If the probability that a process produces a defective part
is 0.2, then the probability distribution that x parts out of
a sample of 10 will be defective is described using
formula (6.3) with n = 10 and p = 0.2: thus to find the
probability that 3 parts among a sample of 10 will be
defective, we compute

F(3) = (10/3) (0.2)^3(0.8)^10-3


= (10!/3!7!) (0.008)(0.2097152)
= 120(0.008)(0.2097152) = 0.020133

13
POSSON DITRIBUTION
• named after French mathematician Siméon Denis
Poisson.
• is a discrete tool that helps to predict the probability of
certain events from happening when you know how
often the event has occurred. It gives us the probability
of a given number of events happening in a fixed
interval of time.

14
Calculating the Poisson Distribution
The Poisson Distribution pmf is: 

Where:
•The symbol “!” is a factorial.
•e: A constant equal to approximately 2.71828 (Euler's Constant)
•μ: The mean number of successes that occur in a specified
region.
•x: The actual number of successes that occur in a specified
region.
•P(x;μ): The Poisson probability that exactly x successes occur in
a Poisson experiment, when the mean number of successes is μ.
15
Poisson Distribution Examples
1. The average number of homes sold by the Acme Realty
company is 2 homes per day. What is the probability
that exactly 3 homes will be sold tomorrow?

Solution:
Given:
• μ = 2; since 2 homes are sold per day, on average.
• x = 3; since we want to find the likelihood that 3 homes
will be sold tomorrow.
e = 2.71828; since e is a constant equal to approximately
2.71828.

16
 
We plug these values into the Poisson formula
as follows:

Thus, the probability of selling 3 homes


tomorrow is 0.180.

17
1. If three persons, on an average, come to ABC company
for job interview, then find the probability that less than
three people have come for interview on a given day.

Solution:
Given:
• μ = 3 
• x = P(x<3;3) = P(0;3) + P(1;3) + P(2;3)
• e = 2.71828

18
 

Hence,
P(x<3;3) = P(0;3)+P(1;3)+P(2;3)
=
0.04978706837+0.1493612051+0.2240
4180766
= 0.42319008113
The probability of less than three
persons coming for interview on a 19
CONTINUOUS PROBABILITY
 
DISTRIBUTION
- a probability distribution in which the random variable X can
take on any value (is continuous). Because there are infinite
values that X could assume, the probability of X taking on any
one specific value is zero. Therefore we often speak in ranges
of values (p(X>0) = .50).
- Cumulative Distribution Function (CDF) is a function that
gives the probability that a random variable is less than or
equal to the independent variable of the function, F(x),
represents the area under the density function to the left of x, .
- The probability of X is between a and b is equal to the
difference of the CDF evaluated at these 2 points, that is:

20
4 Types of
Continuous
Probability
Distribution

21
NORMAL DITRIBUTIONS
- is a probability function that describes how the values of a
variable are distributed; the probability density function is
represented graphically by the familiar bell-shaped curve.

Properties of a normal distribution


• The mean, mode and median are all equal.
• The curve is symmetric at the center (i.e. around the
mean, μ).
• Exactly half of the values are to the left of center and
exactly half the values are to the right.
• The total area under the curve is 1.
22
 
The general formula for the probability density
function of the normal distribution is

Where:
● μ is the mean or average
● σ is the standard deviation

23
Standard Normal
Distribution
 Thecase where μ = 0 and σ = 1 is called the
standard normal distribution. The equation for the
standard normal distribution is

The letter z is usually to represent this particular


variable.
Z-score formula:

24
Standard Normal
Distribution

25
Normal Distribution
Examples
1. A manufacturer of MRI scanners used for medical
diagnosis has data that indicates that the mean
number of days (µ) between malfunctions is 1020
days, with a standard deviation of 20 days.
Assuming a normal distribution, what is the
probability that the number of days between
adjustments will be less than 1044 days? More
than 980 days? Between 980 days to 1044 days?

26
 
Solution:
First, convert the value of x to a z-value. For x =
1044 days, we have:
 

= = 1.2
 
This means that 1044 days is 1.2 standard
deviations above the mean of 1020 days.
Therefore, using Appendix A,

27
 Tofind the probability that X exceeds 980 days, first find
the corresponding z-value:
= = -2.0
 
Note that
Therefore,
Finally, to find the probability that X is between 1044 and
980 days, we use formula:

28
Using the Normal Inverse
Function
 Suppose that the manufacturer of MRI scanners wishes
to determine the number of days for which the
probability that the equipment would not malfunction is
0.80. In this case, we know that This is equivalent to
where z = (x - 1044)/2-0. From Appendix A, we can
determine that z approximately equal to 0.84. Therefore,
solving

29
EXPONENTIAL
DISTRIBUTION
• The time between randomly occurring events, such as
the time to or between failures of mechanical or
electrical components.
• closely related to poisson distribution: if the
distribution of the time between events is
exponential, then the number of events occurring
during an interval of time is poisson.

30
F(x) = λe-λx  for x > 0
Where
1/λ = mean of the exponential distribution
(note that λ is the mean of the
corresponding Poisson distribution)
x = time or distance over which the
variable extends
e =2.71828… (the base of natural
logarithms)
The exponential distribution has the properties that it is
bounded below 0, it has its greatest density at 0, and the
density declines as x increases.

31
Using the Exponential
 
Distribution
A company that makes electronic components for tablets
devices tested a large number of these components.
They found that the average time failure 1/λ = 4000
hours.
What is the probability that a component will fail within
500 hours? After 4000 hours?
The mean rate of failure is λ= 1/4000 = 0.00025
failures/hour. Therefore, the probability of failure within
500 hours is

F(500) = 1 - = 0.1175
32
STATISTICAL
METHODOLOGY
• Descriptive statistics are methods of
presenting data visually and numerically and
includes charts (such as Excel column, line and
pie charts), frequency distributions and
histograms to organize and present data.
Measures of central tendency (means, medians,
proportions) and measures of dispersion (range,
standard deviation, variance).

34
Statistical inference is the process of drawing
conclusions about unknown characteristics of a
population from which data were taken.
• Techniques used in this process include
confidence intervals, hypothesis testing, and
experimental design.
• Experimental design is important for helping to
understand the effects of process factors on
output quality and for optimizing systems.

35
Predictive statistical - to develop predictions of
future values based on historical data.
• Correlation analysis and regression analysis are
two useful techniques - these techniques can
clarify the characteristics of a process as well as
predict future results.

36
SAMPLING
Population - refers to the group of things
that we want information about.
Sample - refers to part of the population
that we take out to examine and draw
conclusions from.
Sampling - forms the basis for statistical
applications.

37
BIASED SAMPLES
occur when one or more parts of the population are
favored over others.
• Convince sample - only includes people who
are easy to reach.
• Voluntary response sample - consist of
people that have chosen to include themselves.

38
UNBIASED SAMPLE
• Simple Random Sampling - MOST BASIC OR
COMMON SAMPLING. Every item in the population has
an equal probability of being selected
• Stratified Sampling - The population is partitioned
into groups, or strata, and a sample is selected from
each group.
STRATA - refers to the groups of similar people, within
each stratum we take srs. Good for making sure who
ever is administrating this gets in contact with each kind
of group.
• Multistage sampling - we use a combination of two
or more srs’s. Comes with different stage to know
where your sample is coming from.
39
• Systematics Sampling - every nth(4th,5th. Etc)
item is selected
• Cluster Sampling - A population is partitioned
into groups (clusters) and a sample of clusters is
selected. Divided into cluster and chosen random
• Judgement Sampling or purposive sampling
- Expert opinion is used to determine the
sample.

40
Sampling error occurs naturally and results from
the fact that a sample may not always be
representative of the population, no matter how
carefully it is selected. To reduce sampling error is
to take a larger sample from the population.
Systematic errors, usually result from poor
sample design and can be reduced or eliminated by
careful planning of the sampling study.

41
DESCRIPTIVE
STATISTICS
Summarizes the numerical characteristics of
population or samples.

POPULATION - is a complete set or collection of


objects of interest
SAMPLE - is a subset of objects taken from the
population.

42
The most important types of
Descriptive Statistics and
Formulas are:
1. Measures of
Location
MEAN -The mean of
- The mean of sample is denoted
population is denoted by by x̄
the Greek µ

We may calculate the mean in Excel using the function


=AVERAGE(data range)
43
MEDIAN – specifies the middle MODE – is the observation that
value (or 50th percentile) when occurs most frequently.
the data are arranged from - It is the most useful data sets
smallest to largest. For odd that consist of a relatively small
number of observations, the number of unique values.
median is the middle of the • In Excel you can use
sorted numbers. For an even =MODE.SNGL(data range) or
number of observations, the =MODE.MULT(data range) to
median is the mean of the two identify a single mode or
middle numbers. multiple modes in the data,
• We may find the median or simply =MODE(data
using the Excel function range).
=MEDIAN(data range)

44
2. Measures of Dispersion
Range is the simplest measure of dispersion and is
computed as the difference between the maximum value
and the minimum value in the data set.
• It is computed in Excel by the formula
=MAX(data range)-MIN(data range)

Variance is a measure of dispersion that depends on all


the data. The larger the variance, the more data are
“spread out”.

45
FORMULA for the Variance of the FORMULA for the Variance of the
Population: Sample:

Where xi is the value of the ith Where n is the number of items


item, N is the number of items in in the sample, and x̄ is the
the population, and µ is the sample mean.
population mean. • =VAR.P(data range) is used to
• In Excel, =VAR.S(data range) compute the variance of
may be used to compute population, o2
sample variance, s2

46
Standard deviation is the square root of the
variance.
For Population the standard and
for sample, it is:
deviation is computed as:

• =STDEV.P(data range) calculates the standard


deviation for a population
• =STDEV.S(data range) calculates the standard
deviation for a sample 47
Proportion is the fraction of data that have a
certain characteristic.
It is usually denoted as p.
• We may use the Excel =COUNTIF(range, criteria)
function to find the number of cells within a
range that meet a specified criteria and then
compute the proportion as a ratio of the count to
the total number of observations.
*insert picture of excel (proportion)

48
3. Measures in Shape
Skewness describes the lack of symmetry of data.
Coefficient of Skewness (CS) measures the
degree of asymmetry of observations around the mean.
-if CS is positive, the distribution of values is
positively skewed;
- if negative, it is negatively skewed.
The closer to zero, the less the degree of skewness.
• Greater than 1 or less than -1 suggests a high degree
of skewness
• 0.5 and 1 or between -0.5 and -1 represents
moderate skewness
• 0.5 and -0.5 indicate relative symmetry
49
Kurtosis refers to the peakedness or flatness of a
histogram.
Coefficient of Kurtosis (CK) measures the degree
of kurtosis of a population.
• Less than 3 – more flat with a wide degree of
dispersion
• Greater than 3 – more peaked with less
dispersion

50
STATISTICAL
ANALYSIS WITH
MICROSOFT EXCEL
THE EXCEL DESCRIPTIVE
STATIISTICS TOOLS
2 Most Useful Excel Tools for statistical
analysis are:
1. Descriptive Statistics tools
2. Histo gram tools

THE EXCEL DESCRIPTIVE STATIISTICS TOOLS


- is a convenient way of obtaining basic summary
measures for sample data.

52
THE EXCEL HISTOGRAM
TOOL
Frequency distribution is a table that shows the
number of observations in each of several non-
overlapping groups.
A graphical depiction of frequency distribution for
numerical data in the forom of a column chart is
called a Histogram.

Frequency distribution and Histogram can be


created by using Data Analysis Toolpak in Excel.

53
STATISTICAL
INFERENCE
SAMPLING
DISTRIBUTIONS
A Sampling Distribution is the distribution of a
statistic for all possible samples of a fixed size.

STANDARD DEVIATION OF THE MEAN


  𝜎
𝜎 𝑥
´= For infinite populations
√𝑛
 

For finite populations

55
DISTRIBUTION OF A RANDOM
VARIABLE
 

z=
The mean length of shafts produced on a lathe has
historically been 50 inches, with a standard deviation
of 0.12 inch. If a sample of 36 shafts is taken what is
the probability that the sample mean would be greater
than 50.04 inches?

56
 
z=

=50 =

=0.12 =
n=36 z=2

57
CONFIDENCE
INTERVALS
A confidence interval (CI) is an interval estimate of a
population parameter that also specifies the
likelihood that the interval contains the true
population parameter.
COMMONLY USED CONFIDENCE LEVELS ARE:
• 90%
• 95%
• 99%

58
COMPUTING A CONFIDENCE INTERVAL
WITH A KNOWN POPULATION STANDARD
DEVIATION
◉ laboratory
A   in a hospital is required to ensure that
the temperature in their sterilizer stays at an
average of at least 100˚C. over an extended period
of time, the population standard deviation has been
shown to be stable at 𝜎=0.5. Find the 95%
confidence interval for the population mean if a
sample of 36 readings was taken, and the sample
mean was found to be 𝑥 ̅=100.3.

59
 
𝜎
𝑥 +¿ 𝑧 ¿
√𝑛
 
 
100.3
z = 1.96
100.3 0.1633
n = 36 100.1367 to
100.4633
60
CONFIDENCE Z-VALUE
LEVEL

90% 1.645
95% 1.96
99% 2.58

61
HYPOTHESIS TESTING
• Hypothesis Testing involves drawing interferences
about two contrasting propositions relating to the
value of a population parameter.
• One of which is assumed to be true in the absence of
contradictory data (the null hypothesis), and the
other which must be true if the null hypothesis is
rejected (the alternative hypothesis).

62
STEPS IN A
HYPOTHESIS TEST:

1. Formulate the hypotheses to test.


2. Select a level of significance.
3. Determine a decision rule on which to base a
conclusion.
4. Collect data and calculate a test statistic.
5. Apply the decision rule to the test statistic and
draw a conclusion.

63
LEVEL OF
SIGNIFICANCE
The Level of Significance defines the risk that we are
willing to take in making the incorrect conclusion
that the alternative hypothesis is true when in fact
the null hypothesis is true.
COMMONLY USED LEVELS OF SIGNIFICANCE:
• 0.10
• 0.05
• 0.01

64
P-VALUE OR OBSERVED
SIGNIFICANCE LEVEL
An alternative approach to comparing a test
statistic to a critical value in hypothesis testing is to
find the probability of obtaining a test statistic value
equal to or more extreme than that obtained from
the sample data when the null hypothesis is true

65
Analysis of Variance
(ANOVA)
Analysis of Variance (ANOVA)?
• is an analysis tool used in statistics that splits an
observed aggregate variability found inside a data set
into two parts: systematic factors and random factors.
• Analysts use the ANOVA test to determine the influence
that independent variables have on the dependent
variable in a regression study.
Two parts of Analysis of Variance
• Systematic factors - have a statistical influence on the
given data set.
• Random Factors - do not.

66
Regression and
Correlation Analysis
Regression Analysis Correlation Analysis
• is used in stats to find trends • is a statistical technique that
in data.  can show whether and how
• will provide you with an strongly pairs of variables are
equation for a graph so that related.
you can make predictions • is fairly obvious your data
about your data. may contain unsuspected
Example correlations.
• Eat and how much weight Example
• Putting on weight over the last • height and weight are
year related;
• taller people tend to be 67
DESIGN OF
EXPERIMENT
is defined as a branch of applied statistics that
deals with planning, conducting, analyzing, and
interpreting controlled tests to evaluate the factors
that control the value of a parameter or group of
parameters. DOE is a powerful data collection and
analysis tool that can be used in a variety of
experimental situations or series of test.

Example
Natural bread dough that would meet the same
quality.
68
THANKS
!
Any questions?

69
SlidesCarnival icons are editable
shapes.

This means that you can:


● Resize them without losing
quality.
● Change fill color and opacity.
● Change line color, width and
style.

Isn’t that nice? :)

Examples:

70
� Now you can use any emoji as an icon!
And of course it resizes without losing quality and you can change the
color.


How? Follow Google instructions
https://twitter.com/googledocs/status/730087240156643328

✋👆👉👍👤👦👧👨👩👪💃🏃💑❤😂😉😋😒
😭👶😸🐟🍒🍔💣📌📖🔨🎃🎈🎨🏈🏰🌏🔌🔑and many
more...

71

You might also like