Probability Statistics Review

1.
Inference about means and proportion with two Test statistic:

populations
x1 −x2 −D 0
 σ1, σ2 known: t=
√
2 2
x 1∧x 2 are the sample means of the two samples n1 s 1 s2
+
and n2 n 1 n2
The point estimation of the difference between the
two population means is p = P(z ≤ k) = T.dist(k,df,true)
x 1−x 2 p = P(z ≥ k) = 1 - T.dist(k,df,true)
The standard error is
√
 Inferences about µ1 - µ2 : matched samples
σ 21 σ 22
σ= +
n1 n2 Suppose employees at a manufacturing company can
use two different methods to perform a production
Interval estimate of µ1 - µ2 task. To maximize production output, the company
wants to identify the method with the smaller
√
2 2
σ1 σ 2 population mean completion time.
x 1−x 2 ± z ∝/ 2 +
n 1 n2
The null and alternative hypotheses are:
1.H0: µ1 - µ2 ≥ D0 Ha: µ1 - µ2 ˂ D0 H0: µ1 - µ2 = D0 Ha: µ1 - µ2 ≠ D0
2.H0: µ1 - µ2 ≤ D0 Ha: µ1 - µ2 ˃ D0 If H0 is rejected, the method providing the smaller
mean completion time would be recommended.
3.H0: µ1 - µ2 = D0 Ha: µ1 - µ2 ≠ D0
H0: µd = 0 Ha: µd ≠ 0
Test statistic for hyppothesis test:
Sample mean and sample standard deviation are:
√
x 1−x 2−D 0 Σ ⅆi ∑ ( d i−d )
2
z= d=
√
2 2
n sd =
σ1 σ2 n−1
+
n1 n2 Test stat:
d−μd
p = P(z ≤ k) = norm.dist(k,0,1,true) t=
sd
p = P(z ≥ k) = 1 - norm.dist(k,0,1,true) √n
x−μ0 p = P(z ≤ k) = T.dist(k,n-1,true)
If test stat z= ˂ 0 => p-value = 2P(z ≤ k)
σ /√n
p = P(z ≥ k) = 1 - T.dist(k,n-1,true)
x−μ0
If test stat z= ˃ 0 => p-value = 2P(z ≥ k)  Inferences about p1 – p2:
σ /√n
Let p1∧ p2 be the sample proportion for a random
Reject H0 if p-value ≤ α. If p-value ˃ α Don’t reject
sample from population 1 and 2
 σ1, σ2 unknown:
Mean and standard error
in this case, use sample standard deviations, s 1 and s2, E ( p1− p2 ) = p1− p2
to estimate the unknown population standard
√
deviations. p1 (1− p1 ) p2 (1−P2 )
σ p −p = +
Interval estimate of µ1 - µ2: 1 2
n1 n2
Interval estimate for p1 – p2
x 1−x 2 ±t ∝/ 2
√ s21 s22
+
n1 n2 p1− p2 ± z ∝/ 2
√ p 1(1− p1) p 2( 1− p2)
n1
+
n2
Degree of freedom: làm tròn xuống
1.H0: p1 - p2 ≥ 0 Ha: p1 - p2 ˂ 0
( )
2 2 2
s s
1 2 2.H0: p1 - p2 ≤ 0 Ha: p1 - p2 ˃ 0
+
n1 n 2
df = 3.H0: p1 - p2 = 0 Ha: p1 - p2 ≠ 0
( ) ()
2 2
1 s 21 1 s22
+ Under the assumption H0 is true as an equality,
n1−1 n1 n2−1 n2 p1 = p2 = p. In this case,
√ p1 (1− p1 ) p2 (1−P2 ) 99% 0.01 2.576
σ p −p = + =
1 2
n1 n2
√ p (1−p ) (
1 1
+ )
n 2 n1
With p unknown, we estimate p by the pooled
n1 p 1 + n 2 p 2
estimation p=
n1 +n2
Test statistic:
p1− p 2
z=
1 1
p (1−p ) ( + )
n2 n1
CALCULATING CONFIDENCE INTERVAL
Estimating a population mean (σ known)
Margin of error (E) defined by
σ
E = zα/ 2
√n
Uses NORM.S.INV to calculate z α / 2 or
CONFIDENCE.NORM to calculate margin of error
Estimating a population mean (σ unknown)
s
E = zα/ 2
√n
Uses T.INV to calculate z α / 2 or CONFIDENCE.T to
calculate (E)
Estimating a population proportion
E = zα/ 2
√ p (1− p)
n
Uses NORM.S.INV to calculate z α / 2
Confidence interval for Means
 σ known
σ
x ± zα/ 2
√n
 σ unknown
s
x ± t α/ 2
√n
Confidence interval for Proportions
p ± z α /2
√ p(1− p)
n
Confidence level α zα
2
90% 0.1 1.645
95% 0.05 1.960
98% 0.02 2.326
di – the difference between the ranking in each set of
2. Simple linear regresstion data
 Correlation
linear regression using the least squares method
The correlation coefficient is a measure of the linear
correlation between two variables X and Y, giving a y = b0 + b1x
value between [-1;1], where 1 is perfect positive
correlation, 0 is no correlation and -1 is perfect ∑ xy−¿ ∑ x ∑ y ¿
b1 = n b0 =
negative correlation
n ∑ x 2i −¿ ¿ ¿
∑ x i y i−¿ ∑ x i ∑ y i ¿
r =n ∑ y − b1 ∑ x
√ n ∑ x 2i −¿ ¿ ¿ ¿ n n
Where (xi,yi)’s are pairs of data for two variables X and
Simple linear regression model
Y and n is the number of pairs of data used in the
analysis The equation that describe how y is related to x and
an error term is called the regression mode
y = ꞵ0 + ꞵ1x + ꞓ
where ꞓ is a random variables referred to as the error
term
The equation that describes how the expected value

of y, denoted E(y), is related to x is called the
regression equation
E(y) = ꞵ0 + ꞵ1x
In practice, ꞵ0 and ꞵ1 are not known and must be

estimated using sample data. Sample statistics,
denoted by b0 and b1 are computed as estimates of
the population parameters ꞵ0 and ꞵ1
3. Time series
r2 – the coefficient of determination. It measures the The additive model:

correlation coefficient
Y = T + S (+C)(+R)
The coefficient of determination can be thought of as
 S=Y–T
a percent. It gives you an idea of how many data
points fall within the results of the line formed by the Y: actual time series T: trend series
regression equation. The higher the coefficient, the
higher percentage of points the line passes through S: seasonal component R: random
when the data points and line are plotted. If the C: cyclinal component
coefficient is 0.80, then 80% of the points should fall
within the regression line. Values of 1 or 0 would Notice that:
indicate the regression line represents all or none of
S > 0 -> exsiting seasonal influences had a positive
the data, respectively. A higher coefficient is an
impact on sale
indicator of a better goodness of fit for the
observations C < 0 -> the business cycle is currently in a downswing
Spearman’s rank correlation coefficient Find the trend: - inspection
R is used to measure the correlation between the - requession analysis
order or rank of two variables
- moving average (MA)
6∑ d
2
i
R=1- 2
The multiplicative model:
n(n −1)
Y = T*S*R
n – is the no of pairs of data
 S = Y/T
Forecasting:
From given data: Y = T + S (Y = T*S) 2,5
a) CI 95% = 17,5 ± 1,96
- Calculate T by moving averages √ 90
= ( 16.98 ; 18.02)
S = Y – T or Y/T b) Because 18.1 > 18.02 => manufacturor can
claim that with 95% confidence that ...
- Once we are done with the analysis, we calculated 2,5
predicted T c) 17,5 ± z α/2 < 17.8
√90
=> Y = Predicted T + S (by additive)
 z α / 2 < 1.138
= Predicted T x S (by multiplicative)
 P ( z<1.138 ) = 0.87
Adjusted average (S = Y – T) = sum/n
α
 =0.13
(S = Y/T) = (sum – n)/n 2
Cách tính trên excel:  α = 0.26
B1: nhập dữ liệu: Period, series Y  confidence level = 0.74
B2: - nếu lẻ => center 4-quarter=> tính average đặt ở  at 74%...

giữa 2,5
d) 1,96 =0.3
- nếu chẵn => T-by 4-quarter tính average => √n
center 4-quarter => tính average của T by 4  n = 267
B3: - additive: S = Y + T
Section 2:
- multiplicative: S = Y/T
B4: tính sum và adjust
B5: adjusted S=Y+T = average – adjust
Forecasting
B6: tính increase T = (Tmax – Tmin)/n (dữ liệu cột center)
B7: tính dự đoán cho năm sau (center 4-quarter) = dữ
liệu trc + increase T
B8: tính period năm sau
= dự đoán năm sau*adjusted S=Y/T (multi) a) H0: µ1 - µ2 = D0 Ha: µ1 - µ2 ≠ D0
79−72
= dự đoán năm sau + adjusted S=Y+T (add) b) z = =2.34> 0 -> upper
√…
Final review 2: p-value = 2P(z ≥ 2.34) = 0.0096 < α = 0.05
Section 1: reject H0
Section 3:
= (0.04;0.156)
Claim is supported as 0.156 < 0.17
ii. 2019 goggles can not be sold, as upper
boundery 0.156 still bigger than 0.15
iii. 2019 goggles could be sold when upper
limit < 0.15
Upper limit = p + margin of error
= p + width/2 < 0.1 + 0,1/2 = 0.15
2 x 1.96 2
iv. n ≥( ) (0.1)(0.9) = 139
0.1
a) y = 21.39 + 1.002x
b) slope b1 = 1.002 means other room rate
increases by 1 USD, entertainment also The interval contains µ with probability 1 – α, so the
increasing by 1 USD. confidence level of the interval is (1–α)x100%
c) r = 0.84 means roomrate & entertainment are σ
strongly & positively correlated The margin of error is zα/2 . The width of the CI is 2
r^2 = 0.7 means 70% of entertainment √n
σ
depends on room rate x zα/2 .
d) x = 128 => y = 149,4 USD √n
Final review 1: The sample mean is calculated = AVERAGE (Input
range)
Section 1:
The margin of error is calculated =
CONFIDENCE.NORM (α,σ,n).
s
σ unknown x ± t α/2
√n
where s is the sample standard deviation, (1 – α) is the
confidence coefficient, and tα/2 is the t value
determined by t distribution with n – 1 degree of
freedom.
In most applications, a sample size n ≥ 30 is adequate
to use this expression.
tα/2 can be calculated by T.INV function

The margin of error is calculated by
CONFIDENCE.T function.
For a given confidence level, we can decrease the
width of the confidence interval by increasing the
sample size
When we wish to estimate the population mean to lie
√
26 0.26(1−0.26) with 95% confident within an interval of width w, the
a) 95% CI = ± 1.96 sample size required is
100 100
( )
2
= (0.174;0.346) 2 x 1.96 σ
n= always use upper rounding point
w
b) 0.17<p<0.346. need upper boundary less than
0.15 as 0,346>0.15, goggles can not be sold σ: population standard deviation
c) i. CI 2019 = 0.1 ±1.96

√ 0.1(1−0.1)
110
Confidence intervals for proportions
For a sample of size n with sample proportions p, the p-value use t distribution P(z ≤ k) = T.Dist (k, n-1, true).
95% CI for p is p ±1.96

√ p (1− p)
n
Test stat about a population proportion
p− p 0
When we estimate a proportion it is desirable to make z=
the width of the confidence interval as small as
possible. We can make the width smaller by selecting
a larger sample size n.
√ p0 (1−p 0)
n
For a 95% confidence interval of width w and

¿
preliminary proportion p= p , the sample size needed
( ) p ( 1− p )
2
2 x 1.96 σ ¿ ¿
is n=
w
If we have no preliminary value for p, the we assume
¿
the worst case scenario, and choose the value p
which maximizes the value of
p¿ ¿). This value is p¿ = 0.5

Developing Null and Alternative Hypothesis
Null hypothesis: H0 Alternative hypothesis: Ha
Assumption to be challenged Research
H0 : µ ≥ µ 0 Ha : µ ˂ µ 0 lower tail test
H0 : µ ≤ µ 0 Ha : µ ˃ µ 0 upper tail test
H0 : µ = µ 0 Ha : µ ≠ µ 0 two tail test

Let x be the sample mean of a sample of size n. Then
σx = σ/√n
x−μ0
Test statistic z=
σ /√n
p-value of the population mean
p = P(z ≤ k) = norm.dist(k,0,1,true)
p = P(z ≥ k) = 1 - norm.dist(k,08,1,true)
Two tail:
x−μ0
If test stat z= ˂ 0 => p-value = 2P(z ≤ k)
σ/√n
x−μ0
If test stat z= ˃ 0 => p-value = 2P(z ≥ k)
σ/√n
Reject H0 if p-value ≤ α. If p-value ˃ α Don’t reject
If CI at 1 – α confidence level is (a,b) => a ˂ α ˂ b
Two tail: If µ0 doesnt belong to I, reject

Lower tail: a ˂ µ ˂ b ˂ µ0 , reject
Upper tail: µ0 ˂ a ˂ µ ˂ b, reject
x−μ0
σ unknown t=
s/√n

Probability Statistics Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Statistics Review

Uploaded by

Copyright:

Available Formats

1.

Inference about means and proportion with two Test statistic:

The equation that describes how the expected value

In practice, ꞵ0 and ꞵ1 are not known and must be

r2 – the coefficient of determination. It measures the The additive model:

B2: - nếu lẻ => center 4-quarter=> tính average đặt ở  at 74%...

tα/2 can be calculated by T.INV function

c) i. CI 2019 = 0.1 ±1.96

95% CI for p is p ±1.96

For a 95% confidence interval of width w and

p¿ ¿). This value is p¿ = 0.5

H0 : µ ≥ µ 0 Ha : µ ˂ µ 0 lower tail test

H0 : µ ≤ µ 0 Ha : µ ˃ µ 0 upper tail test

H0 : µ = µ 0 Ha : µ ≠ µ 0 two tail test

Two tail: If µ0 doesnt belong to I, reject

You might also like