Professional Documents
Culture Documents
√
2 2
x 1∧x 2 are the sample means of the two samples n1 s 1 s2
+
and n2 n 1 n2
The point estimation of the difference between the
two population means is p = P(z ≤ k) = T.dist(k,df,true)
x 1−x 2 p = P(z ≥ k) = 1 - T.dist(k,df,true)
The standard error is
√
Inferences about µ1 - µ2 : matched samples
σ 21 σ 22
σ= +
n1 n2 Suppose employees at a manufacturing company can
use two different methods to perform a production
Interval estimate of µ1 - µ2 task. To maximize production output, the company
wants to identify the method with the smaller
√
2 2
σ1 σ 2 population mean completion time.
x 1−x 2 ± z ∝/ 2 +
n 1 n2
The null and alternative hypotheses are:
1.H0: µ1 - µ2 ≥ D0 Ha: µ1 - µ2 ˂ D0 H0: µ1 - µ2 = D0 Ha: µ1 - µ2 ≠ D0
2.H0: µ1 - µ2 ≤ D0 Ha: µ1 - µ2 ˃ D0 If H0 is rejected, the method providing the smaller
mean completion time would be recommended.
3.H0: µ1 - µ2 = D0 Ha: µ1 - µ2 ≠ D0
H0: µd = 0 Ha: µd ≠ 0
Test statistic for hyppothesis test:
Sample mean and sample standard deviation are:
√
x 1−x 2−D 0 Σ ⅆi ∑ ( d i−d )
2
z= d=
√
2 2
n sd =
σ1 σ2 n−1
+
n1 n2 Test stat:
d−μd
p = P(z ≤ k) = norm.dist(k,0,1,true) t=
sd
p = P(z ≥ k) = 1 - norm.dist(k,0,1,true) √n
x−μ0 p = P(z ≤ k) = T.dist(k,n-1,true)
If test stat z= ˂ 0 => p-value = 2P(z ≤ k)
σ /√n
p = P(z ≥ k) = 1 - T.dist(k,n-1,true)
x−μ0
If test stat z= ˃ 0 => p-value = 2P(z ≥ k) Inferences about p1 – p2:
σ /√n
Let p1∧ p2 be the sample proportion for a random
Reject H0 if p-value ≤ α. If p-value ˃ α Don’t reject
sample from population 1 and 2
σ1, σ2 unknown:
Mean and standard error
in this case, use sample standard deviations, s 1 and s2, E ( p1− p2 ) = p1− p2
to estimate the unknown population standard
√
deviations. p1 (1− p1 ) p2 (1−P2 )
σ p −p = +
Interval estimate of µ1 - µ2: 1 2
n1 n2
Interval estimate for p1 – p2
x 1−x 2 ±t ∝/ 2
√ s21 s22
+
n1 n2 p1− p2 ± z ∝/ 2
√ p 1(1− p1) p 2( 1− p2)
n1
+
n2
Degree of freedom: làm tròn xuống
1.H0: p1 - p2 ≥ 0 Ha: p1 - p2 ˂ 0
( )
2 2 2
s s
1 2 2.H0: p1 - p2 ≤ 0 Ha: p1 - p2 ˃ 0
+
n1 n 2
df = 3.H0: p1 - p2 = 0 Ha: p1 - p2 ≠ 0
( ) ()
2 2
1 s 21 1 s22
+ Under the assumption H0 is true as an equality,
n1−1 n1 n2−1 n2 p1 = p2 = p. In this case,
√ p1 (1− p1 ) p2 (1−P2 ) 99% 0.01 2.576
σ p −p = + =
1 2
n1 n2
√ p (1−p ) (
1 1
+ )
n 2 n1
With p unknown, we estimate p by the pooled
n1 p 1 + n 2 p 2
estimation p=
n1 +n2
Test statistic:
p1− p 2
z=
1 1
p (1−p ) ( + )
n2 n1
CALCULATING CONFIDENCE INTERVAL
Estimating a population mean (σ known)
Margin of error (E) defined by
σ
E = zα/ 2
√n
Uses NORM.S.INV to calculate z α / 2 or
CONFIDENCE.NORM to calculate margin of error
Estimating a population mean (σ unknown)
s
E = zα/ 2
√n
Uses T.INV to calculate z α / 2 or CONFIDENCE.T to
calculate (E)
Estimating a population proportion
E = zα/ 2
√ p (1− p)
n
Uses NORM.S.INV to calculate z α / 2
Confidence interval for Means
σ known
σ
x ± zα/ 2
√n
σ unknown
s
x ± t α/ 2
√n
Confidence interval for Proportions
p ± z α /2
√ p(1− p)
n
Confidence level α zα
2
90% 0.1 1.645
95% 0.05 1.960
98% 0.02 2.326
di – the difference between the ranking in each set of
2. Simple linear regresstion data
Correlation
linear regression using the least squares method
The correlation coefficient is a measure of the linear
correlation between two variables X and Y, giving a y = b0 + b1x
value between [-1;1], where 1 is perfect positive
correlation, 0 is no correlation and -1 is perfect ∑ xy−¿ ∑ x ∑ y ¿
b1 = n b0 =
negative correlation
n ∑ x 2i −¿ ¿ ¿
∑ x i y i−¿ ∑ x i ∑ y i ¿
r =n ∑ y − b1 ∑ x
√ n ∑ x 2i −¿ ¿ ¿ ¿ n n
Where (xi,yi)’s are pairs of data for two variables X and
Simple linear regression model
Y and n is the number of pairs of data used in the
analysis The equation that describe how y is related to x and
an error term is called the regression mode
y = ꞵ0 + ꞵ1x + ꞓ
where ꞓ is a random variables referred to as the error
term
E(y) = ꞵ0 + ꞵ1x
2 x 1.96 2
iv. n ≥( ) (0.1)(0.9) = 139
0.1
a) y = 21.39 + 1.002x
b) slope b1 = 1.002 means other room rate
increases by 1 USD, entertainment also The interval contains µ with probability 1 – α, so the
increasing by 1 USD. confidence level of the interval is (1–α)x100%
c) r = 0.84 means roomrate & entertainment are σ
strongly & positively correlated The margin of error is zα/2 . The width of the CI is 2
r^2 = 0.7 means 70% of entertainment √n
σ
depends on room rate x zα/2 .
d) x = 128 => y = 149,4 USD √n
Final review 1: The sample mean is calculated = AVERAGE (Input
range)
Section 1:
The margin of error is calculated =
CONFIDENCE.NORM (α,σ,n).
s
σ unknown x ± t α/2
√n
where s is the sample standard deviation, (1 – α) is the
confidence coefficient, and tα/2 is the t value
determined by t distribution with n – 1 degree of
freedom.
In most applications, a sample size n ≥ 30 is adequate
to use this expression.
√
26 0.26(1−0.26) with 95% confident within an interval of width w, the
a) 95% CI = ± 1.96 sample size required is
100 100
( )
2
= (0.174;0.346) 2 x 1.96 σ
n= always use upper rounding point
w
b) 0.17<p<0.346. need upper boundary less than
0.15 as 0,346>0.15, goggles can not be sold σ: population standard deviation
( ) p ( 1− p )
2
2 x 1.96 σ ¿ ¿
is n=
w
If we have no preliminary value for p, the we assume
¿
the worst case scenario, and choose the value p
which maximizes the value of