You are on page 1of 58

DATA ANALYTICS FOR BUSINESS

ESTIMATION

Oleh: Dr. Uka Wikarya

MAGISTER AKUNTANSI
FAKULTAS EKONOMI DAN BISNIS
UNIVERSITAS INDONESIA 1
OUTLINE OF DISCUSSION
◼ Binomial Distribution
◼ Normal Distribution

◼ Sampling Distribution for:

❑ Sample mean

❑ Sample Proportion

◼ Point and interval estimation

❑ Confidence Interval Estimate of Mean

❑ Confidence Interval Estimate of Proportion


2
BINOMIAL PROBABILITY DISTRIBUTIONS

3
Bernoulli Experiments
Bernoulli Experiments
• A random experiment with only 2 outcomes is a Bernoulli experiment:

Bernoulli Experiment Possible Outcomes Probability of


“Success”
Toss a coin 1 = heads p = .50
0 = tails
Answer a multiple choice 1 = right p = .25
problem 0 = wrong
Inspect a jet turbine blade 1 = crack found p = .001
0 = no crack found
Purchase a tank of gas 1 = pay by credit card p = .78
0 = do not pay by credit card

4
Binomial Distribution
Properties of the Binomial Distribution
• The binomial distribution arises when a Bernoulli experiment is
repeated n times.
• Each Bernoulli trial is independent so the probability of success p remains
constant on each trial.

• In a binomial experiment, we are interested in X = number of successes in


n Bernoulli trials. So,

X = x1 + x2 + ... + xn xi={0,1}, 1=success, 0=failure

• The probability of a particular number of successes P(X) is determined by


parameters n and p.
5
Binomial Distribution
n = number of trials
Parameters p = probability of success

n!
PDF P( x) = p x (1 − p) n − x
x!(n − x)!
Range X = 0, 1, 2, . . ., n
Mean np

Std. Dev. np(1 − p)

Skewed right if p < .50, skewed left if


Comments p > .50, and symmetric if p = .50.

6
Binomial Distribution
Example: The Multiple Choice Problem
The statistical methods test were contains of 10 problems, with 4-multiple choices in each
problems.
• What is the probability that exactly 2 answer over 10 problem are true? (P(X = 2))?
• What is the probability that at least 6 answers over 10 problem are true? (P(X >= 6))?
P(an answer is true) = p= .25

P(an answer is not true) = 1 - p = .75


n!
Probability Formula : P( x) = p x (1 − p) n − x
x!(n − x!)
10!
P(x = 2) = 2!(10-2)!
(.25)2(1-.25)10-2 = .2816

7
Example: The Multiple Choice Problem
P ( X = 6) = P( x = 6) + P( x = 7) + ... + P ( x = 10)

10 
P( X = 6) =  0.256 0.754 = 0.0162
6
10 
P( X = 7) =  0.257 0.753 = 0.0031
7
10 
P( X = 8) =  0.2580.752 = 0.0004
8
P ( X = 9) = P( X = 10)  0

P ( X = 6) = 0.0162 + 0.0031 + 0.0004 + 0 + 0 = .0196

Probability the true answers is 6 or more is 0.0196 or 1.96%

8
NORMAL DISTRIBUTION

9
Normal Distribution
1. ‘Bell-shaped’ & symmetrical
2. Mean, median, mode are equal f(x )
3. Random variable has infinite range
4. The curve is asymptotic
5. Asymptotic: the curve gets closer and
closer to the X-axis but never actually
touches it. x
6. The total area under the curve is 1.00;
Mean Median
Mode

10
Probability Density Function
 1   x− 2
1 −   
2 
f ( x) = e 

 2p

◼ f(x) = density function of random variable x


◼  = Population standard deviation
◼ p = 3.14159; e = 2.71828
◼ x = Value of random variable (– < x < )
◼  = Population mean

11
Effect of Varying Parameters ( & )

f(X)
B

A C

12
Finding Area below a Normal Curve
f(X) Observed Variable, X

x = b
P(a  x  b) =  f ( x) dx
a
?

a X
 b

Standardized Normal Variable, Z


X −
x =1 Z=
P ( a  x  b) = P ( z a  Z  zb ) 

za
Z
0 zb

13
The Standard Normal Table:
P(0 < z < 1.96)
Standardized Normal Probability Table (Portion)
P(0< Z <Zo)

Zo .04 .05 .06 =1


1.8 .4671 .4678 .4686
.4750
1.9 .4738 .4744 .4750
2.0 .4793 .4798 .4803
= 0 1.96 Z
2.1 .4838 .4842 .4846 Shaded area
Probabilities exaggerated

14
The Standard Normal Table:
P(–1.26  z  1.26)
Standardized Normal Distribution

 =1

.3962 .3962 P(–1.26 ≤ z ≤ 1.26)


= .3962 + .3962
= .7924

–1.26 1.26 Z
 =0
Shaded area exaggerated

15
The Standard Normal Table:
P(z > 1.26)
Standardized Normal Distribution

 =1

.5000 P(z > 1.26)


= .5000 – .3962
.3962 = .1038

1.26 Z
 =0

16
ഥ)
Sampling Distribution for Sample Mean (𝑿

17
Sampling from Normal Populations
◼ Central Tendency Population: Normal Distribution
 = 10
x = 
◼ Dispersion


= 50 X
x = Sampling Distribution
n
n=4 n =16
❑ Sampling with X = 5 X = 2.5
replacement

 - = 50 X
X

18
Standardizing the Sampling Distribution of 𝒙

X − x X − 
Z= =
x 
n
Sampling Standardized Normal
Distribution Distribution

X =1

X X  =0 Z
19
Thinking Challenge
You’re an operations analyst for AT&T.
Long-distance telephone calls are normally
distribution with  = 8 min. and  = 2 min. If
you select random samples of 25 calls, what
percentage of the sample means would be
between 7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.

20
Sampling Distribution Solution*
X − 7.8 − 8
Z= = = −.50
 2
n 25
X −  8.2 − 8
Z= = = .50
 2
Sampling n 25 Standardized Normal
Distribution Distribution

 = .4  =1
X

.3830

.1915 .1915

 –.50
7.8 8 8.2 X 0 .50 Z
21
Sampling from
Non-Normal Populations

22
Sampling from
Non-Normal Populations
◼ Central Tendency
Population Distribution
x =   = 10

◼ Dispersion
  = 50
x = X

n Sampling Distribution
❑ Sampling with n=4 n =30
replacement X = 5 X = 1.8

 - = 50 X
X
23
Central Limit Theorem


As x =
sample n
sampling
size gets
distribution
large
becomes
enough
almost
(n  30) ...
normal.

x = 
X
24
Central Limit Theorem Example

The amount of soda in cans of a


particular brand has a mean of
12 oz and a standard deviation
of .2 oz. If you select random
samples of 50 cans, what
SODA
percentage of the sample
means would be less than
11.95 oz?

25
Central Limit Theorem Solution*
X − 11.95 − 12
Z= = = −1.77
 .2
n 50
Sampling Standardized Normal
Distribution Distribution

X = .03 =1


.0384

.4616

11.95 12 X –1.77 0 Z
Shaded area exaggerated 26
Sampling Distribution for Proportion

27
Theoretical Distribution of Sample Proportion

• A proportion is a mean of data whose values is only 0 or 1.


• The Central Limit Theorem (CLT) states that the distribution of a
sample proportion p = x/n approaches a normal distribution with
mean p and standard deviation [p(1- p)/n]-5
mean of p = x/n is a consistent estimator of p.

p (1 − p )
p =
n

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 28


Sampling Distribution for a Proportion (p)

• The distribution of a sample


proportion p = x/n is symmetric
if p = .50 and regardless of p,
approaches symmetry as n
increases.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 29


Sampling Distribution for a Proportion (p)
• As n increases, the statistic p = x/n more closely resembles
a continuous random variable.

• As n increases, the distribution becomes more symmetric


and bell shaped.

• As n increases, the range of the sample proportion p = x/n


narrows.

• The sampling variation can be reduced by increasing the


sample size n.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved. 30


Sampling Distribution for a Proportion (p)
• Rule of Thumb: The sample proportion p = x/n may be assumed to be normal if
both np > 10 and n(1-p) > 10.
• Jokowi was the winner in the Running Presidency Election in 2014, he got about
53% vote. If we assumes, in the next presidency election 2019, the proportion is
not changed. A political consultant had planned to draw a sample of 1200 voter.
Find the probability the proportion of vote for Jokowi is in the range of 51-55%!

• Will p be normally distributed?

• p=1200, p=0.53 → np =636; and n(1-p)=564

• Both are  10, so we may conclude p will be normally distributed

31
Sampling Distribution for a Proportion (p)
f(p)
Observed Variable, p
0.53(1 − .53)
p = = 0.0144
1200
.53
P(.51  p  .53) =  f ( p) dp
.51

p
.51 p=.53 0.55

P(.51  p  .53) = P(−1.389  Z  1.389) p −p


Z=
p (1 − p )
x =1 n

P (.51  p  .53) = .418 + .418 = 0.836 0.418 0.418

Z
-1.389 0 1.389

Standardized of variable, p into Z


32
RATIONALE FOR ESTIMATION

33
Estimation Process

Population Random Sample


I am 95% confident
Mean that  is between 40
Mean, , is X = 50 ☺ & 60.
unknown ☺

☺ ☺
☺Sample☺


☺ ☺

34
Unknown Population Parameters Are Estimated

Estimate Population with Sample


Parameter... Statistic
Mean  x
Proportion p p
2

2
Variance s
Differences 1 -  2 x1 -x2
35
Point and Interval Estimates

◼ A point estimate is a single value (point) derived from a sample


and used to estimate a population value.

◼ A confidence interval estimate is a range of values constructed


from sample data so that the population parameter is likely to
occur within that range at a specified probability. The specified
probability is called the level of confidence.

36
Interval Estimation

37
Key Elements of Interval Estimation
Sample statistic
Length of Confidence Interval (point estimate)
= 2 Margin of Error

Confidence Confidence
lower limit upper limit

Margin of Error Margin of Error

• A probability that a confidence interval can capture population parameters was called Confidence
Level, denoted (1 – )100. The  is probability that interval cannot capture a parameter.
• Typical values of (1-)100% are 99%, 95%, 90%

38
Intervals in different Confidence Levels

P(−z α/2  Z  z α/2 ) = 1 − α

2.5 -1.96 -1.65 0 1.65 1.96 2.5 Z


8 8
90%
95%
99%
P(μ − z α/2 σ x  x  μ + z α/2 σ x ) = 1 − α
X =  ± Zx X =  ± Zx
μ − 1.96σ x μ + 1.96σ x

μ − 2.58σ x μ − 1.65σ x  μ + 1.65σ x μ + 2.58σ x X


90% Samples
95% Samples

99% Samples
39
Constructing Confidence Interval for 

From probability formula in getting any sample mean, as stated in previous slide:

P(μ − z α/2 σ x  x  μ + z α/2 σ x ) = 1 − α


We derive to get probability formula for :
P(−z α/2 σ x  x -   z α/2 σ x ) = 1 − α

P(-x − z α/2 σ x  -  − x + z α/2 σ x ) = 1 − α

P(x + z α/2 σ x    x − z α/2 σ x ) = 1 − α


The probability to find  in the interval is (1-), or

P(x − z α/2 σ x    x + z α/2 σ x ) = 1 − α


40
Intervals & Confidence Level

Sampling Distribution of Sample Mean

x
any value of
sample mean

/2 1- /2

X
μx = μ

(1 – α)% of
intervals
contain μ
x α% do not
x − z α/2 σ x x + z α/2 σ x
x
x − z α/2 σ x x + z α/2 σ x

Very large number of intervals


41
Factors Affecting Interval Width

1. Data dispersion
Intervals extend from
• Measured by  X – Z. X toX + ZX

2. Sample size

X =
n
3. Level of confidence
(1 – )
• Affects Z

© 1984-1994 T/Maker Co.


42
Confidence Interval Estimates

Confidence
Intervals

Mean Proportion

σ Unknown
σ Known

43
Confidence Interval Estimate of
Population Mean

44
Confidence Interval Mean ( Known)
1. Assumptions
• Population standard deviation is known
• Population is normally distributed
• If not normal, can be approximated by normal distribution (n  30)

2. Confidence interval estimate


 
X − Z / 2     X + Z / 2 
n n

45
Estimation Example Mean ( Known)

The mean of a random sample of n = 25 is ഥ X = 50. Set up a


95% confidence interval estimate for  if  = 10.
 
X − Z / 2     X + Z / 2 
n n
10 10
50 − 1.96     50 + 1.96 
25 25
46.08    53.92

46
Thinking Challenge
You’re a Q/C inspector for Syrup BCA. The  for 2-litters
bottles is .05 liters. A random sample of 100 bottles showed
𝑋ത = 1.99 liters. What is the 90% confidence interval
estimate of the true mean amount in 2-liter bottles?
 
X − Z / 2     X + Z / 2 
n n

.05 .05
1.99 − 1.645     1.99 + 1.645  800 ml
100 100 2 liter

1.982    1.998

47
Confidence Interval Mean ( Unknown)
Student’s t–distribution
1. Assumptions
• Population standard Standard
Normal t (df > 120)
deviation is unknown
• Population must be Bell-Shaped
Symmetric t (df = 13)
normally distributed
‘Fatter’ Tails
t (df = 5)
2. Use Student’s t–distribution

0 t

48
Student’s t Table
Assume:
/2 n=3
df = n - 1 = 2
v t.10 t.05 t.025  = .10
/2 =.05
1 3.078 6.314 12.706

2 1.886 2.920 4.303

3 1.638 2.353 3.182 /2

0 2.920 t
t values

49
Confidence Interval Mean ( Unknown)

S S
X − t / 2     X + t / 2 
n n

df = n − 1

50
Estimation Example Mean ( Unknown)

A random sample of n = 25 has 𝑋=ത 50 and s = 8. Set up a


95% confidence interval estimate for .

S S
X − t / 2     X + t / 2 
n n
8 8
50 − 2.064     50 + 2.064 
25 25
46.69    53.30

51
Thinking Challenge
You’re a time study analyst in manufacturing. You’ve
recorded the following task times (min.):
3.6, 4.2, 4.0, 3.5, 3.8, 3.1. What is the 90% confidence
interval estimate of the population mean task time?

From sample computation:


◼ 𝑋ത = 3.7 • n = 6, df = n - 1 = 6 - 1 = 5

◼ s = 3.8987 • t.05 = 2.015

3.8987 3.8987
3.7 − 2.015     3.7 + 2.015 
6 6
.492    6.908

52
Confidence Interval Estimate of
Proportion

53
Confidence Interval Proportion
1. Assumptions
• Random sample selected
• p = success proportion in a sample, p= xi/n, where xi={0,1}
• Normal approximation can be used if
np15 and n(1−p) 15

2. Confidence interval estimate

p (1 − p ) p (1 − p )
p − z 2  p  p + z 2
n n
54
Estimation Example Proportion

A random sample of 400 undergraduates showed 32 went to


graduate school. Set up a 95% confidence interval estimate for
p.
ˆˆ
pq ˆˆ
pq
pˆ − Z / 2   p  p + Z / 2 
ˆ
n n

.08  .92 .08  .92


.08 − 1.96   p  .08 + 1.96 
400 400

.053  p  .107

55
Thinking Challenge
You’re a production manager for a newspaper.
You want to find the % defective. Of 200
newspapers, 35 had defects. What is the 90%
confidence interval estimate of the population
proportion defective?

56
Confidence Interval Solution*

pˆ  qˆ pˆ  qˆ
pˆ − z / 2   p  pˆ + z / 2 
n n

.175  (.825) .175  (.825)


.175 − 1.645   p .175 + 1.645 
200 200

.1308  p  .2192

57
Terimakasih

58
58

You might also like