Professional Documents
Culture Documents
EDA 101
College of Engineering
President Ramon Magsaysay State University
January 4, 2021
Sampling Distribution and Point Estimates Test of Hypothesis for a Single Population
Sampling Distribution
Definition (Variance)
2
The variance of a discrete random variable X, denoted by σX or V [X] is
X
2
σX = V [X] = (x − µX )2 f (x)
all x
Sampling Distribution
µ = E[X] = np
σ 2 = V[X] = np(1 − p)
Sampling Distribution
µ = 12 (a + b)
σ2 = 1
12 (b − a)2
Sampling Distribution
Sampling Distribution
Sampling Distribution
Sampling Distribution
Sampling Distribution
Example
An electronics company manufactures resistors that have a mean resistance of 100 ohms and a
standard deviation of 10 ohms. The resistance follows a normal distribution. Find the
probability that a random sample of 25 resistors will have an average resistance of fewer than
95 ohms.
2 2
The distribution of X is normal with E X = µ = 100 and V X = σn = 10
25 = 4.
X−E X 95 − 100
P X < 95 = P q < √
VX 4
Sampling Distribution
σ2
X ∼ N µ, as n → ∞
n
Sampling Distribution
or equivalently,
X−µ
√ ∼ N(0, 1)
σ/ n
Sampling Distribution
Example
Suppose that a random variable X has a continuous uniform distribution
Find the distribution of the sample mean of a random sample of size 40.
The distribution of X is approximately normal with mean and variance
a+b 4+6
µX = E[X] = = =5
2 2
2 V[X] (b − a)2 /12 1
σX = = =
n 40 120
or
1
X ∼ N 5, 120
Sampling Distribution
Example
What is the probability that a random sample of size 40 has a mean value between 4.8 and 5.3?
4.8 − 5 X−E X 5.3 − 5
P 4.8 < X < 5.3 = P p < q <p
1/120 VX 1/120
Note: If X is normally distributed, the sampling distribution is exactly normal regardless of the sample size n.
Sampling Distribution
Sampling Distribution
σ2 σ2
X1 − X2 ∼ N µ1 − µ2 , 1 + 2
n1 n2
or
X1 − X2 − (µ1 − µ2 )
s ∼ N(0, 1)
σ12 σ22
+
n1 n2
Note: If both populations are normally distributed, the condition for the sample sizes n1 and n2 is relaxed.
Sampling Distribution
Example
Two independent experiments are run in which two different types of paint are compared.
Thirty-six specimens are painted using type A, and the drying time, in hours, is recorded for
each. The same is done with type B. The population standard deviations are both known to be
1.0.
Assuming that the mean drying time is equal for the two types of paint, find
P XA − XB > 0.5 .
Since n1 = n2 = 36 > 30, we can apply the Central Limit Theorem.
1.02 1.02
XA − XB ∼ N µA − µB , +
36 36
Also, µA = µB (the mean drying time is equal for the two types of paint). Thus,
" #
0.5 − 0 0.5
P XA − XB > 0.5 = PZ > q =P Z> p = 0.017 003
1.02
+ 1.02 1/18
36 36
Sampling Distribution
µ=p σ 2 = p(1 − p) = pq
µ = np σ 2 = np(1 − p) = npq
Sampling Distribution
Sampling Distribution
Results
If n ≥ 30,
X−p
p ∼ N(0, 1)
pq/n
If n < 30 but np > 5 and nq > 5 the distribution of X is approximately normal.
If n1 ≥ 30 and n2 ≥ 30,
X1 − X2 − (p1 − p2 )
r ∼ N(0, 1)
p1 q1 p2 q2
+
n1 n2
If n1 < 30 and n2 < 30 but n1 p1 > 5, n1 q1 > 5, n2 p2 > 5 and n2 q2 > 5 the distributions
of X1 and X2 are approximately normal.
Sampling Distribution
Sampling Distribution
Sample Computation: n = 5
xi x2i
1.3 1.69
1.8 3.24
1.4 1.96
1.1 1.21
1.8 3.24
7.4 11.34
!2
n n
1 X 1 X
s2 = x2i − xi
n − 1 i=1 n i=1
Point Estimation
Probability Theory
The time X until recharge for a battery in a laptop computer under common conditions is
normally distributed with µ = 260 minutes and σ = 50 minutes. Find the probability that a
fully charged laptop lasts anywhere from 3 to 4 hours.
Z 240
1 1 2
P[180 < X < 240] = √ e− 2·502 (x−260) dx = 0.2898
180 50 2π
Point Estimation
Statistical Inference
1 The distribution of the time X until recharge for a battery in a laptop is unknown.
2 The mean and variance of X are unknown.
⇒ We measure the time x until recharge of a sample of laptop computers, say 25 units.
⇒ We use some formula (like x) to estimate the mean life µ until recharge of a laptop battery.
Point Estimation
Terminologies
Definition (Parameter)
A parameter is a quantity, θ, that is a property of an unknown probability distribution.
Definition (Statistic)
A statistic is a function of observable random variables.
Our goal is to find out as much as possible about a parameter θ, using the information
contained within a sample.
Point Estimation
Terminologies
Definition (Point Estimator)
A statistic that is used to estimate an unknown parameter θ is a point estimator of θ, denoted
Θ̂.
When the observations are recorded, the statistic takes a value θ̂ called the point estimate.
Examples (Estimates of µ)
Let x1 = 1.5, x2 = 1.8, x3 = 1.4.
1 6.9
1 4 (5x1 + 2x2 − 3x3 ) = 4 = 1.725
1 4.7
2 3 (x1 + x2 + x3 ) = 3 = 1.56
1 7.6
3 5 (2x1 + x2 + 2x3 ) = 5 = 1.52
Point estimates can only be as good as the data set from which they are calculated.
Examples
5X1 + 2X2 − 3X3 5µ + 2µ − 3µ
1 µ̂1 = : E[µ̂1 ] = =µ
4 4
X1 + X2 + X3 µ+µ+µ
2 µ̂2 = : E[µ̂2 ] = =µ
3 3
2X1 + X2 + 2X3 2µ + µ + 2µ
3 µ̂3 = : E[µ̂3 ] = =µ
5 5
All three estimators are unbiased for µ.
Variance of an Estimator
5X1 + 2X2 − 3X3 1 19 2
V = (25 + 4 + 9)σ 2 = σ
4 16 8
X1 + X2 + X3 1 1
V = (1 + 1 + 1)σ 2 = σ 2
3 9 3
2X1 + X2 + 2X3 1 9 2
V = (4 + 1 + 4)σ 2 = σ
5 25 25
If all unbiased estimators Θ̂ are considered, the one with the smallest variance is called the
minimum variance unbiased estimator.
If the standard error of Θ̂ has an unknown parameter, an estimate of the unknown parameter is
used and the value is called standard error estimate.
Example (Standard Error of a Population Proportion)
r r
p(1 − p) p̂(1 − p̂)
se P̂ = ≈
n n
where Θ̂ can be
1 X
2 X1 − X2
3 P̂
4 P̂1 − P̂2
Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
Test of Hypothesis
Examples
1 A supplier claims that its products made from a graphite-epoxy composite material have a
tensile strength of 40. An experimenter may test this claim by collecting a random sample
of products and measuring their tensile strengths.
2 Immediately below the asphalt surface of a roadway is a layer of base material composed
of a crushed stone or gravel aggregate. The resilient modulus of this aggregate is a
measure of how the aggregate deforms when subjected to stress, and it is an important
property affecting the manner in which the roadway responds to loads. A construction
engineer has four different suppliers of this aggregate material who obtain their raw
materials from four different locations. The engineer would like to assess whether the
aggregates from the four different locations have different values of resilient modulus.
Hypothesis Testing
Hypothesis Testing
A Simple Case
From a packet of 200 seeds that I planted, only 180 germinated.
It is reasonable to conclude that this evidence does not refute p = 0.93.
It also does not refute p = 0.90 or perhaps even p = 0.94.
Rejection of a hypothesis implies that the sample evidence refutes the hypothesis.
What is the risk of rejecting a hypothesis say p = 0.94, when in fact, the hypothesis is true.
In the light of the evidence, the risk of incorrectly rejecting a true hypothesis p = 0.94 is
0.018. Thus, the decision is to reject the hypothesis.
SirMedz CoE - PRMSU
Sampling Distribution and Point Estimates Test of Hypothesis for a Single Population
Hypothesis Testing
Hypothesis Testing
H0 : µ ≥ 25
H1 : µ < 25
2 A textile fiber manufacturer is investigating a new drapery yarn, which the company
claims has a mean thread elongation of (at least) 12 kilograms.
H0 : µ ≤ 12
H1 : µ > 12
Hypothesis Testing
H0 : p ≤ 30%
H1 : p > 30%
Hypothesis Testing
H0 : µ = 50
H1 : µ 6= 50
2 A random sample of 400 voters in a certain city are asked if they favor an additional 4%
gasoline sales tax to provide badly needed revenues for street repairs. If more than 220 but
fewer than 260 favor the sales tax, we shall conclude that 60% of the voters are for it.
H0 : p = 0.60
H1 : p 6= 0.60
Hypothesis Testing
Decision Errors
Decision
State of H0 Reject H0 Do not reject H0
type I error
H0 is true correct decision
α
type II error
H0 is false correct decision
β
Hypothesis Testing
Test Statistic
The “discrepancy” between the data set and the null hupothesis is measured through a test
statistic.
Example
A supplier claims that its products made from a graphite-epoxy composite material have a
tensile strength of 40. When the tensile strengths of 30 randomly selected products are
measured, a sample mean of x̄ = 38.518 and a sample standard deviation of s = 2.299 are
obtained.
The statistic
x̄ − µ
t= √
s/ n
measures the discrepancy between x̄ = 38.518 and µ = 40.
Hypothesis Testing
p−value
The p−value is a measure of the plausibility or credibility of the null hypothesis.
It is the observed level of significance.
It is the smallest level of significance that would lead to the rejection of H0 .
It is a measure of the risk of rejecting H0 in favor of H1 when H0 is true.
It is the probability of obtaining the data set (or even worse) when H0 is true.
p = P[H0 is rejected|H0 is true]
Hypothesis Testing
Example
A supplier claims that its products made from a graphite-epoxy composite material have a
tensile strength of 40. A random sample of size 30 yields x̄ = 38.518 and s = 2.299.
ν = 29
38.518 40
p = P X ≤ 38.518
SirMedz CoE - PRMSU
Sampling Distribution and Point Estimates Test of Hypothesis for a Single Population
Hypothesis Testing
Decision
Decision Rule
If the observed level of significance p is less than the specified level of significance α, the
decision is to reject H0 in favor of H1 , otherwise, the null hypothesis is not rejected.
Hypothesis Testing
Population Mean
Population Mean
A manufacturer of sports equipment has developed a new synthetic fishing line that the company
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the
hypothesis that µ = 8 kilograms against the alternative that µ 6= 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.
A H0 : µ = 8, H1 : µ 6= 8
Population Mean
A manufacturer of sports equipment has developed a new synthetic fishing line that the company
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the
hypothesis that µ = 8 kilograms against the alternative that µ 6= 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.
A H0 : µ = 8, H1 : µ 6= 8
B α = 0.01
Population Mean
A manufacturer of sports equipment has developed a new synthetic fishing line that the company
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the
hypothesis that µ = 8 kilograms against the alternative that µ 6= 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.
A H0 : µ = 8, H1 : µ 6= 8
B α = 0.01
X − µ0
C Z = √
σ/ n
Population Mean
A manufacturer of sports equipment has developed a new synthetic fishing line that the company
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the
hypothesis that µ = 8 kilograms against the alternative that µ 6= 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.
A H0 : µ = 8, H1 : µ 6= 8
B α = 0.01
X − µ0
C Z = √
σ/ n
x̄ − µ0 7.8 − 8
D z = √ = √ = −2.8284
σ/ n 0.5/ 50
p = 2 ∗ P[Z > 2.8284] ≈ 0.0047
Population Mean
A manufacturer of sports equipment has developed a new synthetic fishing line that the company
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the
hypothesis that µ = 8 kilograms against the alternative that µ 6= 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.
A H0 : µ = 8, H1 : µ 6= 8
B α = 0.01
X − µ0
C Z = √
σ/ n
x̄ − µ0 7.8 − 8
D z = √ = √ = −2.8284
σ/ n 0.5/ 50
p = 2 ∗ P[Z > 2.8284] ≈ 0.0047
E Reject H0 since p < 0.01.
Population Mean
A manufacturer of sports equipment has developed a new synthetic fishing line that the company
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the
hypothesis that µ = 8 kilograms against the alternative that µ 6= 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of
significance.
A H0 : µ = 8, H1 : µ 6= 8
B α = 0.01
X − µ0
C Z = √
σ/ n
x̄ − µ0 7.8 − 8
D z = √ = √ = −2.8284
σ/ n 0.5/ 50
p = 2 ∗ P[Z > 2.8284] ≈ 0.0047
E Reject H0 since p < 0.01.
F Based on a random sample of size 50, there is sufficient evidence that the mean breaking strength
of a new synthetic fishing line is different from 8 kg with p = 0.0047.
SirMedz CoE - PRMSU
Sampling Distribution and Point Estimates Test of Hypothesis for a Single Population
Population Mean
A supplier claims that its products made from a graphite-epoxy composite material have a
tensile strength of 40. When the tensile strengths of 30 randomly selected products are
measured, a sample mean of x̄ = 39.018 and a sample standard deviation of s = 2.299 are
obtained. Test the hypothesis against H1 : µ < 40 at 5% level of significance.
A H0 : µ ≥ 40, H1 : µ < 40
B α = 0.05
X − µ0
C T = √
S/ n
x̄ − µ0 39.018 − 40
D t= √ = √ = −2.33955
s/ n 2.299/ 30
p = P T(29) < −2.33995 = 0.0132
Population Mean
The sodium content of twenty 300-gram boxes of organic cornflakes was determined. The data
(in milligrams) are as follows: 131.15, 130.69, 130.91, 129.54, 129.64, 128.77, 130.72, 128.33,
128.24, 129.65, 130.14, 129.29, 128.71, 129.00, 129.39, 130.42, 129.53, 130.12, 129.78,
130.92. Can you support a claim that mean sodium content of this brand of cornflakes differs
from 130 milligrams? Use α = 0.05.