Professional Documents
Culture Documents
“This course is designed for undergraduate students with emphasis on problem solving related to
societal issues that engineers are called upon to solve.”
Prepared by:
THE BEGINNINGS OF
INTRODUCTION STATISTICS
The History of statistics can be said to start around 1749 although, overtime, there have
been changes to the interpretation of the statistics. In early times, the meaning was
restricted to information about states. In modern terms, “statistics” means both sets of
collected information, as in national accounts and temperature records, and analytical work
which requires statistical inference.
STATISTICS – the science and the art which deals on interpreting data from facts
and information.
– a science which deals with the methods of gathering, presentation
analysis and interpretation of data.
TYPES OF STATISTICS
Description - summarizes data numerically and/or graphically. Then, the
results are interpreted using mean and standard deviation.
Inferential - generalization of findings of a sample of a population to make
conclusions about the nature of the whole population.
2
EDA 11 - ENGINEERING DATA ANALYSIS
DATA
QUANTITATIVE QUALITATIVE
CONTINUOUS
DATA DISCRETE DATA ATTRIBUTE DATA OPEN DATA
(DISCONTINUOUS)
(VARIABLE)
DATA is one of the most important and vital aspect of any research study.
TYPES OF DATA
1. Quantitative – are measures of values or counts and expressed as numbers.
Continuous Data (Variable) – data that can take the form of
decimals or continuous values of varying degrees of precision (e.g.
height, weight)
Discrete Data (Discontinuous) – data whose value cannot take the
form of decimals. (e.g. Family size, enrolment size)
3
EDA 11 - ENGINEERING DATA ANALYSIS
DATA GATHERING
LESSON 1.0
In Data Collection the first step in any investigation is collection of data.
The Data may be collected for the whole population or for a sample only. It is mostly
collected on sample basis.
Collection of Data is very difficult job. The investigator is the well-trained person who
collects the statistical data. The respondents are the persons from whom the information is
collected.
SOURCES OF DATA
EXTERNAL INTERNAL
SOURCES SOURCES
SECONDARY
PRIMARY DATA
DATA
4
EDA 11 - ENGINEERING DATA ANALYSIS
Direct Personal
Investigation
Investigation
Indirect Oral
through
Investigation METHODS OF Investigation
COLLECTING
PRIMARY
DATA
Investigation
Investigation
through Local
through Mailed
Reporters'
Questionnaire
Questionnaire
5
EDA 11 - ENGINEERING DATA ANALYSIS
INTERNATIONAL
GOVERNMENT
PUBLISHED
SOURCES
MUNICIPAL
CORPORATION
UNPLISHED
SOURCES
INSITUTIONAL/
COMMERCIAL
6
EDA 11 - ENGINEERING DATA ANALYSIS
7
EDA 11 - ENGINEERING DATA ANALYSIS
CRITERIA OF USE
1. There is a time priority in a casual effect.
2. There is consistency in a causal relationship.
3. The magnitude of correlation is great.
8
EDA 11 - ENGINEERING DATA ANALYSIS
PROBABILITY
LESSON 2.0
Probability is the branch of mathematics that studies the possible outcomes of given
events together with the outcomes’ relative likelihoods and distributions. In common
usage, the word “probability” is used to mean the chance that a particular event (or set of
events) will occur expressed on a linear scale from 0 (impossibility) to 1 (certainty), also
expressed as a percentage between 0 and 100%. The analysis of events governed by
probability is called statistics.
9
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLES:
1. Construct a sample space for the experiment that consists of tossing a single
coin.
Solution: The outcomes could be labeled h for heads and t for tails. Then
the sample space is the set: S = { h,t }
2. Construct a sample space for the experiment that consists of rolling a single
die. Find the events that correspond to the phrases “an even number is
rolled” and “a number greater than two is rolled.”
Solution: The outcomes could be labeled according to the number of
dots on the top face of the die. Then the sample space is the set S = { 1, 2,
3, 4, 5, 6 }
The outcomes that are even are 2, 4, and 6, so the event that
corresponds to the phrase “an even number is rolled” is the set { 2, 4, 6 },
which it is natural to denote by the letter E. We write E = { 2, 4, 6 }.
Similarly the event that corresponds to the phrase “a number
greater than two is rolled” is the set T={ 3, 4, 5, 6 }, which we have
denoted T.
10
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Construct a sample space that describes all three-child families according to
the genders of the children with respect to birth order.
Solution: Two of the outcomes are “two boys then a girl,” which we
might denote bbg, and “a girl then two boys,” which we would denote
gbb.
Clearly there are many outcomes, and when we try to list all of
them it could be difficult to be sure that we have found them all unless
we proceed systematically. The tree diagram gives a systematic
approach.
The diagram was constructed as follows. There are two possibilities for the first
child, boy or girl, so we draw two-line segments coming out of a starting point, one ending
in a b for “boy” and the other ending in a g for “girl.” For each of these two possibilities for
the first child there are two possibilities for the second child, “boy” or “girl,” so from each of
the b and g we draw two-line segments, one segment ending in a b and one in a g. For each
of the four ending points now in the diagram there are two possibilities for the third child,
so we repeat the process once more.
The line segments are called branches of the tree. The right ending point of each
branch is called a node. The nodes on the extreme right are the final nodes; to each one
there corresponds an outcome, as shown in the figure.
From the tree it is easy to read off the eight outcomes of the experiment, so the
sample space is, reading from the top to the bottom of the final nodes in the tree,
11
EDA 11 - ENGINEERING DATA ANALYSIS
Permutation relates to the act of arranging all the members of a set into some sequence or
!
order. 𝑛𝑃𝑟 = ( )!
EXAMPLES:
1. Critical Miss, PSU's Tabletop Gaming Club, has 15 members this term. How
many ways can a slate of 3 officers consisting of a president, vice-president,
and treasurer be chosen?
Solution: In this case, repeats are not allowed since we don’t want the
same member to hold more than one position. The order matters, since if
you pick person 1 for president, person 2 for vice-president, and person 3
for treasurer, you would have different members in those positions than
if you picked person 2 for president, person 1 for vice-president, and
person 3 for treasurer. This is a permutation problem with n = 15 and r =
3.
15P3=15!/(15−3)!=15!/12!=2730
In general, if you were selecting items that involve rank, a position title,
1st, 2nd, or 3rd place or prize, etc. then the order in which the items are
arranged is important and you would use permutation.
Types of PERMUTATION:
1. Permutation of Distinct Elements - The number of permutations of n different
elements is n!, where n! = n(n − 1)(n − 2)· · · 3 · 2 · 1.
2. Permutation of Subsets - The number of permutations of n distinct objects taken r
!
at a time is 𝑛𝑃𝑟 = ( )!
.
3. Circular Permutation - The number of permutations of n objects arranged in a
circle is (n − 1)!.
12
EDA 11 - ENGINEERING DATA ANALYSIS
Partitions – The number of ways of partitioning a set of n objects into r cells with n1
elements in the first cell, n2 elements in the second, and so forth, is , ,…,
=
!
.
! !··· !
EXAMPLES:
1. In how many ways can 7 graduate students be assigned to 1 triple and 2
double hotel rooms during a conference?
Solution: The total number of possible partitions would be
, ,
= 7! / 3!2!2! = 210
Combination is a way of selecting items from a collection, such that (unlike permutations)
!
the order of selection does not matter. 𝑛𝐶𝑟 = )!]
[ !(
EXAMPLES:
1. Critical Miss, PSU's Tabletop Gaming Club, has 15 members this term. They
need to select 3 members to have keys to the game office. How many ways
can the 3 members be chosen?
Solution: In this case, repeats are not allowed, because we don’t want
one person to have more than one key. The order in which the keys are
handed out does not matter. This is a combination problem with n = 15
and r = 3.
We can use these counting rules in finding probabilities. For instance, the
probability of winning the lottery can be found using these counting
rules.
13
EDA 11 - ENGINEERING DATA ANALYSIS
RULES OF PROBABILITY
GENERAL PROBABILITY RULES
1. The probability of an impossible event is zero; the probability of a certain
event is one. Therefore, for any event A, the range of possible probabilities
is: 0 ≤ P(A) ≤ 1.
2. For S the sample space of all possibilities, P(S) = 1. That is the sum of all the
probabilities for all possible events is equal to one. Recall the party affiliation
above: if you have to belong to one of the three designated political parties,
then the sum of P(R), P(D) and P(I) is equal to one.
3. For any event A, P(Ac) = 1 - P(A). It follows then that P(A) = 1 - P(Ac)
4. This is the probability that either one or both events occur.
If two events, say A and B, are mutually exclusive - that is A and B
have no outcomes in common - then P(A or B) = P(A) + P(B)
If two events are NOT mutually exclusive, then P(A or B) = P(A) + P(B)
- P(A and B)
5. This is the probability that both events occur.
P(A and B) = P(A) • P(B|A) or P(B)*P(A|B) Note: this straight line
symbol, |, does not mean divide! This symbols means "conditional" or
"given". For instance, P(A|B) means the probability that event A
occurs given event B has occurred.
If A and B are independent - neither event influences or affects the
probability that the other event occurs - then P(A and B) = P(A)*P(B).
This particular rule extends to more than two independent events.
For example, P(A and B and C) = P(A)*P(B)*P(C)
6. or
14
EDA 11 - ENGINEERING DATA ANALYSIS
DISCRETE PROBABILTY
LESSON 3.0 DISTRIBUTION
Probability distribution as defined is an assignment of probabilities to the values of the
random variable. When we speak of random variable, it represents a quantitative
(numerical) variable that is measured or observed in an experiment. Therefore, you can
actually find a probability associated with that variable.
A Discrete Probability Distribution is a probability distribution that depicts the occurrence
of discrete (individually countable) outcomes, such as 1, 2, 3, yes, no, true, or false. The
binomial distribution, for example, is a discrete distribution that evaluates the probability of
a "yes" or "no" outcome occurring over a given number of trials, given the event's
probability in each trial—such as flipping a coin one hundred times and having the outcome
be "heads". It also counts occurrences that have countable or finite outcomes.
Discrete Probability Distributions are graphs of the outcomes of test results that are
finite, such as a value of 1, 2, 3, true, false, success, or failure. Investors use discrete
probability distributions to estimate the chances that a particular investing outcome is
more or less likely to happen. Armed with that information, they can choose a hedging
strategy that matches the probabilities found in their analysis.
15
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Suppose that a coin is tossed twice so that the sample space is S = {HH, HT,
TH, TT}. Let X represent the number of heads that can come up. With each
sample point we can associate a number for X as shown in table below. Thus,
for example, in the case of HH (i.e., 2 heads), X = 2 while for TH (1 head), X =
1. It follows that X is a random variable.
Sample Point HH HT TH TT
X 2 1 1 0
16
EDA 11 - ENGINEERING DATA ANALYSIS
K x− =1
K (1 – 1/3) = 1
K = 3/2
X= 0 1 2 3
f(x) = 1/8 3/8 3/8 1/8
f(x) is the Probability Mass Function.
X = xi 0 1 2 3
f(xi) = P(x =
1/8 3/8 3/8 1/8
xi )
0 if x < 0
1/8 if 0 ≤ x < 1
F(x) = 4/8 if 1 ≤ x < 2 *where F(x) is the Cumulative
7/8 if 2 ≤ x < 3 Distribution Function
8/8 if 3 ≤ x < ∞
17
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Determine the probability mass function of X from the cumulative
0, 𝑥 < −2
0.2, −2 ≤ 𝑥 < 0
distribution function: 𝑓(𝑥) =
0.7, 0 ≤ 𝑥 < 2
1, 2 ≤ 𝑥
Solution: The domain of the probability mass function are the included
endpoints of each interval, x = −2, 0, 2. The value of f(x) at each x is
determined by f(x ) = F(x )− F(x − 1) for i = 1, 2, 3 and f(x ) is taken
to be equal to F(x ).
f(x1) = f(−2) = F(x2) = F(0) = 0.2
f(x2) = f(0) = F(x2) − F(x1) = F(0) − F(−2) = 0.7 − 0.2 = 0.5
f(x3) = f(2) = F(x3) − F(x2) = F(2) − F(0) = 1 − 0.7 = 0.3
0.2, x = −2
Therefore, f(x) 0.5, x = 0
0.3, x = 2
18
EDA 11 - ENGINEERING DATA ANALYSIS
µX = E[X] = ∑( )x f(x)
EXAMPLE:
1. A salesperson for a medical device company has two appointments on a
given day. At the first appointment, he believes that he has a 70% chance to
make the deal, from which he can earn $1000 commission if successful. On
the other hand, he thinks he only has a 40% chance to make the deal at the
second appointment, from which, if successful, he can make $1500. What is
his expected commission based on his own probability belief? Assume that
the appointment results are independent of each other.
Solution: Let Y denote the total commission of the salesperson in the
appointments. The table below summarizes his total commission and
the associated probabilities in parentheses.
f(x) = p (1 − p) , x = 0, 1, 2, . . . , n.
The binomial random variable is probably the most important of all discrete probability
distributions. Its probability distribution is called a binomial distribution. The mean µ and
variance σ of the binomial random variable X with parameters n and p, the number of
trials and the probability of a success, respectively, are
µ = E[X] = np
σ2 = E[X] = np(1 − p)
19
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Each sample of water has a 10% chance of containing a particular organic
pollutant. Assume that the samples are independent with regard to the
presence of the pollutant. a) Find the probability that in the next 18 samples,
exactly 2 contain the pollutant. b) Find the probability that 3 to 5 of the 20
samples contain the pollutant. c) Find the mean and standard deviation of
the number of pollutants in 16 samples.
Solution:
a) Let X be the number of samples that contain the pollutant in the next
18 samples analyzed. X is a binomial random variable with p = 0.1 and n
= 18. Therefore, P[X = 2] = (0.1) (0.9) = 0.2835.
b) The required probability is P[3 ≤ X ≤ 5].
P[3 ≤ X ≤ 5] = P[X = 3] + P[X = 4] + P[X = 5]
= (0.1) (0.9) + (0.1) (0.9) + (0.1) (0.9) = 0.3118
c) µ = np = 16(0.1) = 1.6
σ = √ 𝜎 = √ np(1 − p) = √ 16(0.1)(0.9) = 1.2
( )
f(x) = !
, x = 0, 1, 2, . . .
where λ is the average number or outcomes per unit time, distance, area, or volume.
EXAMPLE:
1. Ten is the average number of oil tankers arriving each day at a certain port.
The facilities at the port can handle at most 15 tankers per day. a) What is the
probability of finding 8 oil tankers on a given day? b) What is the probability
that on a given day tankers have to be turned away?
Solution:
a) We are given λ = 10 (oil tankers per day), so we take t = 1 (day).
f(x = 8) = !
= 0.1126
b) Tankers will be turned away if the number of tankers exceed the
port’s capacity of 15. Thus, the probability we seek is P[X > 15].
P[X > 15] = 1 − P[X ≤ 15]
=1-∑ !
= 1 − 0.9513 = 0.0487
20
EDA 11 - ENGINEERING DATA ANALYSIS
CONTINUOUS PROBABILTY
LESSON 4.0 DISTRIBUTION
Physical quantities such as time, length, area, temperature, pressure, load, intensity, etc.,
when they need to be described probabilistically, are modeled by continuous random
variables.
EXAMPLE:
𝑥 , −1 < 𝑥 < 2
1. Consider the function 𝑓 (𝑥) = 𝑓 (𝑥) = . a) Show that it is a
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
probability density function of some continuous random variable X. b)
Determine 𝑃[0 < 𝑋 < 1].
21
EDA 11 - ENGINEERING DATA ANALYSIS
Solution:
a) We show that Properties (1) and (2) are satisfied. 1) Clearly, 𝑥 ≥ 0
for all real number x. 2) We must show that
∫ 𝑓(𝑥)𝑑𝑥 = 1, ∫ 𝑥 𝑑𝑥 = 1, ∫ 𝑥 𝑑𝑥 = 1
b) 𝑃[0 < 𝑋 ≤ 1] = ∫ 𝑥 𝑑𝑥 =
If X is a continuous random variable with probability density function f(x), the cumulative
distribution function F(x) is defined as
EXAMPLE:
1. Suppose that for some continuous random variable X, 𝐹(𝑥) = (𝑥 − 1)
for 1 ≤ 𝑥 ≤ 3. a) What is the probability that X assumes a value between 1.2
and 2.6? b) Find the density function and use it to compute 𝑃[1.2 < 𝑋 <
2.6].
Solution:
a) We apply Property 3 to compute 𝑃[1.2 < 𝑋 < 2.6].
𝑃[1.2 < 𝑋 < 2.6] = 𝐹 (2.6) − 𝐹 (1.2)
= (2.6 − 1) − (1.2 − 1) = 0.5453
b) 𝑓 (𝑥 ) = 𝐹 (𝑥 ) = (4𝑥 ) = ,
.
𝑃[1.2 < 𝑋 < 2.6] = ∫ . 𝑥 𝑑𝑥 = 0.5453
22
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
𝑥 , −1 < 𝑥 < 2
1. Consider the function 𝑓(𝑥) = , compute the mean and
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
variance of the random variable.
Solution:
1 1 5
𝜇= 𝑥( 𝑥 )𝑑𝑥 = 𝑥 𝑑𝑥 =
3 3 4
1 51
𝜎 = 𝑥 ( 𝑥 𝑑𝑥) − 𝜇 =
3 80
NORMAL DISTRIBUTION
Undoubtedly, the most widely used model for a continuous measurement is a normal
random variable and its distribution, normal distribution, is the most important
continuous probability distribution.
EXAMPLE:
1. The time X until recharge for a battery in a laptop computer under common
conditions is normally distributed with µ = 260 minutes and σ = 50 minutes.
Find the probability that a fully charged laptop lasts a) anywhere from 3 to 4
hours; b) less than 270 minutes; c) longer than 300 minutes.
Solution:
( )
a) P[180 < X < 240] = ∫ 𝑒 . 𝑑𝑥 = 0.2898
√
( )
b) P[X < 270] = ∫ 𝑒 . 𝑑𝑥
√
1 ( ) 1 ( )
= 𝑒 . 𝑑𝑥 + 𝑒 . 𝑑𝑥
50√2𝜋 50√2𝜋
= 0.5793
( )
c) P[X > 300] = ∫ 𝑒 . 𝑑𝑥
√
1 ( ) 1 ( )
= 𝑒 . 𝑑𝑥 − 𝑒 . 𝑑𝑥
50√2𝜋 50√2𝜋
= 0.2119
23
EDA 11 - ENGINEERING DATA ANALYSIS
EXPONENTIAL DISTRIBUTION
The random variable X, the distance between successive events from a Poisson process
with mean number of events λ > 0 per unit distance is an exponential random variable
with parameter λ. The probability density function of X is
1 1
𝑓(𝑥) = 𝜆𝑒 − 𝜆𝑥, 𝜇= & 𝜎 =
𝜆 𝜆
EXAMPLE:
1. The lifetime of a mechanical assembly in a vibration test is exponentially
distributed with a mean of 400 hours. a) What is the probability that an
assembly on test fails in less than 100 hours? b) What is the probability that
an assembly operates for more than 500 hours before failure?
Solution:
a) Let X be the time before a mechanical assembly in a vibration test
fails, measured in hours. The mean lifetime is µ = 400 hours.
P[X < 100] = 1 − F(100) = 0.2212
( )
b) 𝑃[𝑋 > 500] = 𝑒 = 0.2865
24
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Two ballpoint pens are selected at random from a box that contains 3 blue
pens, 2 red pens, and 3 green pens. Let X be the number of blue pens
selected and Y be the number of red pens selected. a) Find the joint
probability mass function f(x,y). b) Find the probability of selecting at least
one green pen at random.
Solution:
a) Clearly, x = 0, 1, 2 and y = 0, 1, 2 with the restriction 0 ≤ x + y ≤ 2 since
two ballpoint pens are selected. The possible pair of values (x, y) are (0,
0), (0, 1), (0, 2), (1, 0), (1, 1) and (2, 2). f(1, 0) represents the probability
that 1 blue ballpen and no red ballpen are selected. The probability mass
function is
25
EDA 11 - ENGINEERING DATA ANALYSIS
𝑓 (𝑥, 𝑦) =
= + +
9 6 3 9
= + + =
28 28 28 14
The joint probability density function of the continuous random variables X and
Y, denoted f(x, y), satisfies:
1. 𝑓(𝑥, 𝑦) ≥ 0
2. ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
⬚
3. 𝑃[(𝑋, 𝑌) ∈ 𝑅] = ∬ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 for any region R in the xy-plane.
EXAMPLE:
1. Let X and Y be continuous random variables with joint density function
𝑓(𝑥, 𝑦) = 𝑥 𝑦, −1 ≤ 𝑥 ≤ 2, 1 ≤ 𝑦 ≤ 5. a) Calculate
𝑃[𝑋 ≥ 0, 1 ≤ 𝑌 < 4]. b) Calculate 𝑃[𝑋 ≥ 1, 𝑋 ≤ 𝑌 < 4]. c) Calculate
−1 < 𝑋 < (𝑌 − 1),
𝑃
1≤𝑌≤5
Solution:
a) The probability we seek is 𝑃[0 ≤ 𝑋 < 2, 1 ≤ 𝑌 < 4] =
∫ ∫ 𝑥 𝑦𝑑𝑦𝑑𝑥
We first compute the inner integral, leaving the variable x ‘untouched’
since the variable of integration is y. ∫ 𝑥 𝑦𝑑𝑦 = 𝑥
We now compute the outer integral ∫ 𝑥 𝑑𝑥 =
26
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. a) Obtain the marginal probability density function of X in example of joint
probability density function and b) verify Property number 2 of probability
density function.
27
EDA 11 - ENGINEERING DATA ANALYSIS
Solution:
1
𝑓(𝑥 ) = 𝑥 𝑦𝑑𝑦
36
a) = ∫ 𝑥 𝑦𝑑𝑦 = 𝑥
b) ∫ 𝑥 𝑑𝑥 = ∫ 𝑥 𝑑𝑥 = 1
EXAMPLE:
1. Two ballpoint pens are selected at random from a box that contains 3 blue
pens, 2 red pens, and 3 green pens. Let X be the number of blue pens
selected and Y be the number of red pens selected. Compute 𝑃[𝑋 = 1|𝑌 =
0] and 𝑃[𝑋 = 1|𝑌 = 1].
Solution:
9
𝑃[𝑋 = 1, 𝑌 = 0] 𝑓(1,0) 18 3
𝑃[𝑋 = 1|𝑌 = 0] = = = =
𝑃[𝑌 = 0] 𝑓 (0) 15 5
28
3
𝑃[𝑋 = 1, 𝑌 = 1] 𝑓(1,1) 14 1
𝑃[𝑋 = 1|𝑌 = 1] = = = =
𝑃[𝑌 = 1] 𝑓 (1) 3 2
7
MORE THAN TWO RANDOM VARIABLES - More than two random variables can
be defined in a random experiment. Results for multiple random variables are
straightforward extensions of those for two random variables. A summary for the
continuous random variables is provided here. For the discrete case, simply replace the
integral with a summation.
28
EDA 11 - ENGINEERING DATA ANALYSIS
A probability density function f(x1, x2, . . . , xn) for the continuous random
variables X1, X2, . . . , Xn has the following properties:
1. 𝑓(𝑥 , 𝑥 … , 𝑥 ) ≥ 0
2. ∫ ∫ … ∫ 𝑓(𝑥 , 𝑥 … , 𝑥 )𝑑𝑥 𝑑𝑥 … 𝑑𝑥 = 1
3. 𝐹𝑜𝑟 𝑎𝑛𝑦 𝑟𝑒𝑔𝑖𝑜𝑛 𝑅 𝑜𝑓 𝑛 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 𝑠𝑝𝑎𝑐𝑒,
⬚ ⬚
𝑃[(𝑋 , 𝑋 … , 𝑋 )𝜖𝑅] = … 𝑓(𝑥 , 𝑥 … , 𝑥 ) 𝑑𝑥 𝑑𝑥 … 𝑑𝑥
⬚
EXAMPLE:
1. Suppose that X1, X2, and X3 represent the thickness in micrometers of a
substrate, an active layer, and a coating layer of a chemical product,
respectively. Assume that the random variables are independent and
normally distributed with µ1 = 10000, µ2 = 1000, µ3 = 80, σ1 = 250, σ2 = 20, and
σ3 = 4, respectively. The specifications for the thickness of the substrate,
active layer, and coating layer are 9200 < x 1 < 10800, 950 < x2 < 1050, and 75 <
x3 < 85, respectively. a) What proportion of chemical products meets all
thickness specifications? b) Which one of the three thickness has the least
probability of meeting specifications?
Solution:
a) The requested probability is
𝑃[9200 < 𝑋 < 10800, 950 < 𝑋 < 1050, 75 < 𝑋 < 85]
Because the random variables are independent,
𝑃[9200 < 𝑋 < 10800, 950 < 𝑋 < 1050, 75 < 𝑋 < 85]
= 𝑃[9200 < 𝑋 < 10800]𝑃[950 < 𝑋 < 1050]𝑃[75 < 𝑋 < 85]
= 𝑃[−3.2 < 𝑍 < 3.2]𝑃[−2.5 < 𝑍 < 2.5]𝑃[−1.25 < 𝑍 < 1.25]
After standardizing. From Appendix A-1, the desired probability is
(0.998626)(0.987581)(0.788701) = 0.777836
b) The thickness of the coating layer has the least probability of meeting
specifications. Consequently, a priority should be to reduce variability in
this part of the process.
29
EDA 11 - ENGINEERING DATA ANALYSIS
30
EDA 11 - ENGINEERING DATA ANALYSIS
31
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. An automated filling machine fills soft-drink cans. The mean fill volume is
12.1 fluid ounces, and the standard deviation is 0.1 oz. Assume that the fill
volumes of the cans are independent, normal random variables. What is the
probability that the average volume of 10 cans selected from this process is
less than 12 oz?
Solution: Let 𝑋 , 𝑋 , … , 𝑋 denote the fill volumes of the 10 cans. The
average fill volume 𝑋 is a normal random variable with
.
𝐸 𝑋 = 12.1 𝑉𝑋 = = 0.001
.
Consequently, 𝑃 𝑋 < 12 = 𝑃[𝑍 = 𝑃[𝑍 < −3.16] = 0.000789
√ .
32
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Let the random variables X1 and X2 denote the length and width,
respectively, of a manufactured part. Assume that X1 is normally distributed
with E[X1] = 2 cm and standard deviation 0.1 cm and that X2 is normal with
E[X2] = 5 cm and standard deviation 0.2 cm. Also assume that X1 and X2 are
independent. Determine the probability that the perimeter exceeds 14.5 cm.
Solution: Let 𝑌 = 2𝑋 + 2𝑋 be the perimeter of the manufactured part.
Y is normally distributed with
𝜇 = 𝐸[𝑌] = 2(2) + 2(5) = 14
𝜎 = 𝑉[𝑌] = 4(0.1) + 4(0.2) = 0.2
.
Thus, 𝑃[𝑌 > 14.5] = 𝑃 𝑧 > = 𝑃[𝑍 > 2.5] = 0.131776
√ .
33
EDA 11 - ENGINEERING DATA ANALYSIS
POINT ESTIMATION
In statistical inference, the term Parameter is used to denote a quantity θ (Greek theta),
say, that is a property of an unknown probability distribution. For example, it may be the
mean, variance, or a particular quantile of the probability distribution. Parameters are
unknown, and one of the goals of statistical inference is to estimate them.
Parameters can be thought of as representing a quantity of interest about a general
population. In earlier chapters, probability calculations were made based on given values of
the parameters of the probability distributions, but in practice the parameters are unknown
since the probability distribution that characterizes observations from the population is
unknown. An experimenter’s goal is to find out as much as possible about these parameters
since they provide an understanding of the underlying probability distribution that
characterizes the population.
The primary purpose in taking a random sample is to obtain information about the
unknown
population parameters. Suppose, for example, that we wish to reach a conclusion about the
proportion of people in a locality who prefer a particular brand of soft drink. Let p represent
the unknown value of this proportion. It is impractical to question every individual in the
population to determine the true value of p. To make an inference regarding the true
proportion p, a more reasonable procedure would be to select a random sample (of an
appropriate size) and use the observed proportion 𝑝̂ of people in this sample favoring the
brand of soft drink.
The sample proportion, 𝑝̂ , is computed by dividing the number of individuals in the sample
who prefer the brand of soft drink by the total sample size n. Thus, 𝑝̂ is a function of the
observed values in the random sample. Because many random samples are possible from
population, the value of 𝑝̂ will vary from sample to sample. That is, 𝑝̂ is a random variable.
Such a random variable is called a statistic.
34
EDA 11 - ENGINEERING DATA ANALYSIS
SAMPLE MEAN
Let 𝑋 , 𝑋 , … , 𝑋 be a random sample of size n from a population with density function f(x),
⋯
mean 𝜇 and variance 𝜎 . The random variable 𝑋, define by 𝑋 = .
EXAMPLE:
1. An electronics company manufactures resistors that have a mean resistance
of 100 ohms and a standard deviation of 10 ohms. The resistance follows a
normal distribution. Find the probability that a random sample of 25
resistors will have an average resistance of fewer than 95 ohms.
Solution: According to the reproductive property, the sampling
distribution of 𝑋 is normal with 𝐸 𝑋 = 𝜇 = 100 and 𝑉 𝑋 = = =
4. Thus,
⎡ ⎤
𝑋−𝐸 𝑋 95 − 100
𝑃 𝑋 < 95 = 𝑃 ⎢ < ⎥ = 𝑃[𝑍 < −2.5] = 0.0062
⎢ √4 ⎥
𝑉𝑋
⎣ ⎦
If the distribution of resistance is normal with mean 100 ohms and
standard deviation of 10 ohms, finding a random sample of resistors with
a sample mean less than 95 ohms is a rare event. If this actually happens,
it casts doubt as to whether the true mean is really 100 ohms or if the
true standard deviation is really 10 ohms.
SAMPLE PROPORTION
Consider the random sample 𝑋 , 𝑋 , … , 𝑋 from the population X, we can write the estimate
as
𝑥
𝑝̂ = 𝑥 =
𝑛
where 𝑥 represents the number of trials resulting in a success, among n trials.
( )
The Standard Error of the estimate of the proportion p is 𝜎 = .
35
EDA 11 - ENGINEERING DATA ANALYSIS
EXAMPLE:
1. Suppose that the probability p that a vaccine provokes a serious adverse
reaction is unknown. If the vaccine is administered to n = 500,000 head of
cattle and then x0 = 372 are observed to suffer the reaction. a) Find the point
estimate of p. b) Calculate the standard error estimate.
Solution:
a) 𝑝̂ = = 0.000744
. ( . )
b) 𝜎 = = 0.3856 × 10
SAMPLE VARIANCE
The statistic S2 is an unbiased point estimator of the population variance σ2, and the
numerical value s2 computed from the sample data is called the point estimate of σ2. The
sample variance can be thought of as an estimate of the variance σ2 of the unknown
underlying probability distribution of the observations in the data set. It provides an
indication of the variability in the sample in the same way that the variance σ2 provides an
indication of the variability of a probability distribution.
36
EDA 11 - ENGINEERING DATA ANALYSIS
When is the sample size large enough so that the Central Limit Theorem can be assumed to
apply? The answer depends on how close the underlying distribution is to the normal. A
general rule is that the approximation is adequate as long as n ≥ 30, although the
approximation is often good for much smaller values of n, particularly if the distribution of
the random variables Xi has a probability density function with a shape reasonably similar
to the normal bell-shaped curve. In most cases encountered in practice, this guideline is
very conservative, and the Central Limit Theorem will apply for sample sizes much smaller
than 30.
UNBIASED ESTIMATOR
If 𝑋 , 𝑋 , … , 𝑋 is a random sample from a population with density function f(x), the
statistics 𝜃 = ℎ(𝑋 , 𝑋 , … , 𝑋 ) is called a Point Estimator of the unknown parameter 𝜃.
After the sample 𝑥 , 𝑥 , … , 𝑥 has been selected, the point estimator 𝜃 takes on a single
numerical value 𝜃 = ℎ(𝑥 , 𝑥 , … , 𝑥 ) called the point estimate of 𝜃.
The point estimator 𝜃 is an Unbiased Estimator of the unknown parameter 𝜃 if 𝐸 𝜃 = 𝜃.
If the estimator is not unbiased, the quantity 𝑏𝑖𝑎𝑠 𝜃 = 𝐸 𝜃 − 𝜃 is called the Bias of the
estimator.
EXAMPLE:
1. Compute the variance of the two unbiased estimators of 𝜇 using a random
sample of size 𝑛.
1
𝜇̂ = (𝑋 + 𝑋 + ⋯ + 𝑋 )
𝑛
1
𝜇̂ = (2𝑋 + 𝑋 + ⋯ + 𝑋 + 2𝑋 )
𝑛+2
37
EDA 11 - ENGINEERING DATA ANALYSIS
Solution:
We apply equation
𝑉[𝑎 𝑋 + 𝑎 𝑋 + ⋯ + 𝑎 𝑋 ] = 𝑎 𝑉[𝑋 ] + 𝑎 𝑉[𝑋 ] + ⋯ + 𝑎 𝑉[𝑋 ]
1 1
𝑉[𝜇̂ ] = 𝑉 (𝑋 + 𝑋 + ⋯ + 𝑋 ) = (𝑉[𝑋 ] + 𝑉[𝑋 ] + ⋯ + 𝑉[𝑋 ])
𝑛 𝑛
1 𝜎
= (𝑛𝜎 ) =
𝑛 𝑛
1
𝑉[𝜇̂ ] = 𝑉 (2𝑋 + 𝑋 + ⋯ + 𝑋 + 2𝑋 )
𝑛+2
1
= (4𝑉[𝑋 ] + 𝑉[𝑋 ] + ⋯ + 𝑉[𝑋 ] + 4𝑉[𝑋 ])
(𝑛 + 2)
1 𝑛+6
= (4𝜎 + 𝜎 + ⋯ + 𝜎 + 4𝜎 ) = 𝜎
(𝑛 + 2) (𝑛 + 2)
STANDARD ERROR
When the numerical value or point estimate of a parameter is reported, it is usually
desirable to give some idea of the precision of estimation. The measure of precision usually
employed is the Standard Error of the estimator that has been used.
The standard error of an estimator 𝜃 is its standard deviation given by 𝑠𝑒 𝜃 = 𝜎 =
𝑉[𝜃].
38