You are on page 1of 56

BUSINESS STATISTICS

BY: LAI VAN TAI


Decision making
 Yes or no / your decision?
 Setting up a new business or put money in the bank?
 Should we increase the salary of our employee?
 Should we increase the budget for marketing activities?
 Should we invest the new machine for the assemble
line?
 What should we do to increase the sale rate? Why?

2
Decision making

 How to convince people that your decision is the


best one?

3
Decision making
 What is the best choice?

Market conditions
High demand Average demand Low demand

Small Factory 200 100 -20

Average Factory 350 120 -150

Large factory 600 150 -300

4
Data Types

Data

Qualitative Quantitative
(Categorical) (Numerical)

Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
(Defined categories) Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured
characteristics)
5
Data, Data Sets, Elements,
Variables, and Observations

Variables
Stock Annual Earn/
Company Exchange Sales($M) Sh.($)
Dataram AMEX 73.10 0.86
EnergySouth OTC 74.00 1.67
Keystone NYSE 365.70 0.86
LandCare NYSE 111.40 0.33
Psychemedics AMEX 17.60 0.13

Elements Data Set Datum

6
Data Types

Sales (in $1000’s)


2003 2004 2005 2006 Time
Series
Atlanta 435 460 475 490 Data
Boston 320 345 375 395
Cleveland 405 390 410 395
Denver 260 270 285 280

Cross Sectional
Data

1-7
DATA TYPE AND LEVELS

Data timing Time series Cross sectional

Data type Qualitative Quantitative

Data levels Nominal Ordinal Interval Ratio

8
Populations and Samples

 A Population is the set of all items or individuals of


interest
 Examples: All likely voters in the next election
All parts produced today
All sales receipts for November

 A Sample is a subset of the population


 Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit

9
Population vs. Sample

Population Sample

b c
a b cd
gi n
ef gh i jk l m n
o r u
o p q rs t u v w
y
x y z
Inferential Procedures
 Making statements about a population by
examining sample results
Sample statistics Population parameters
(known) Inference
unknown, but can
be estimated from
sample evidence

Sample Population

Goal: Convert data into meaningful information!


11
Descriptive Procedures
 Collect data
 e.g., Survey, Observation,
Experiments

 Present data
 e.g., Charts and graphs

 Characterize data
 e.g., Sample mean =  x i

12
Example: Hudson Auto Repair

The manager of Hudson Auto would like to have


a better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

13
Example: Hudson Auto Repair
 Tabular Summary (Frequencies and Percent Frequencies)
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100

14
Example: Hudson Auto Repair
 Graphical
18 Summary (Histogram)
16
14
Frequency

12
10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)
3
-
1
6

Shape of a Distribution
 Describes how data is distributed
 Symmetric or skewed
 The greater the difference between the mean and the median,
the more skewed the distribution

Left-Skewed Symmetric Right-Skewed

Mean < Median Mean = Median Median < Mean


(Longer tail extends to left) (Longer tail extends to right)
Copyright ©2011 Pearson
Education, Inc. publishing as
Empirical Rule

For data having a bell-shaped distribution:

 Approximately 68% of the data values will be


within one standard deviation of the mean.
Example: Hudson Auto Repair
 Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 .62 62
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100

18
Example: Hudson Auto Repair
 Ogive with Cumulative Percent Frequencies
100
Cumulative Percent Frequency

80

60

40

20
Parts
Cost ($)
50 60 70 80 90 100 110
Descriptive Statistics: Numerical Methods

 Measures of Location
 Measures of Variability
 Measures of Relative Location and Detecting Outliers

x
20
Measures of Location

 Mean
 Median
 Mode
 Percentiles
 Quartiles

21
Example: Apartment Rents

 Mean
x
x i

34,356
 490.80
n 70

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents
 Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents
 Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents
 90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents

 Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Measures of Variability

 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation
Variance
 The variance is the average of the squared differences
between each data value and the mean.
 If the data set is a sample, the variance is denoted by s2.

 i
( x  x ) 2
s2 
n 1
 If the data set is a population, the variance is denoted
by  2.

 ( xi   ) 2
 
2
N
Example: Apartment Rents
Variance


( xi  x ) 2
2
s   2 , 9 9 6.4 7
n 1
 Standard Deviation

s  s2  2996. 47  54. 74
 Coefficient of Variation
s 54. 74
 100   100  11.15
x 490.80
Variation

 Measures of variation give information on


the spread or variability of the data values.

 Smaller value
 Less variation
 Larger value
 More variation

Same center,
different variation

3-30
3
-
3

Constructing the
1

Box and Whisker Plot

* *
Outliers Lower 1st Median 3rd Upper
Limit Quartile Quartile Limit

The lower limit is The upper limit is


Q1 – 1.5 (Q3 – Q1) Q3 + 1.5 (Q3 – Q1)

 The center box extends from Q1 to Q3


 The line within the box is the median
 The whiskers extend to the smallest and largest values within
the calculated limits
 Outliers are plotted outside the calculated limits
Example: Apartment Rents
 Box Plot
Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
There are no outliers.

375 400 425 450 475 500 525 550 575 600 625
Introduction to Probability
 Experiments, Counting Rules, and
Assigning Probabilities
 Events and Their Probability
 Some Basic Relationships of Probability
 Conditional Probability
Assigning Probabilities
 Classical Method
Assigning probabilities based on the assumption of
equally likely outcomes.
 Relative Frequency Method
Assigning probabilities based on experimentation or
historical data.
 Subjective Method
Assigning probabilities based on the assignor’s
judgment.
Classical Method

If an experiment has n possible outcomes, this method


would assign a probability of 1/n to each outcome.

 Example
Experiment: Rolling a die
Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a 1/6 chance of
occurring.
Relative Frequency Method

The probability assignments are given by dividing


the number-of-days frequencies by the total frequency
(total number of days).
Number of Number
Polishers Rented of Days Probability
0 4 .10 = 4/40
1 6 .15 = 6/40
2 18 .45 etc.
3 10 .25
4 2 .05
40 1.00
Counting Rule for Combinations
Another useful counting rule enables us to count the
number of experimental outcomes when n objects are to
be selected from a set of N objects.
 Number of combinations of N objects taken n at a time

 N N!
CnN   
 n  n !(N  n )!

where N! = N(N - 1)(N - 2) . . . (2)(1)


n! = n(n - 1)( n - 2) . . . (2)(1)
0! = 1
Example: Bradley Investments
 Tree Diagram
Markley Oil Collins Mining Experimental
(Stage 1) (Stage 2) Outcomes
Gain 8 (10, 8) Gain $18,000
(10, -2) Gain $8,000
Lose 2
Gain 10 Gain 8 (5, 8) Gain $13,000

Lose 2 (5, -2) Gain $3,000


Gain 5
Gain 8
(0, 8) Gain $8,000
Even
Lose 2 (0, -2) Lose $2,000
Lose 20 Gain 8
(-20, 8) Lose $12,000
Lose 2 (-20, -2) Lose $22,000
Hypergeometric Distribution Formula

(Two possible outcomes per trial: success or failure)

N X X
C . C
P( x )  n x
N
x
C n
Where
N = population size
X = number of successes in the population
n = sample size
x = number of successes in the sample
n – x = number of failures in the sample

5-39
Sampling and Sampling Distributions

 Simple Random Sampling


 Point Estimation
 Sampling Distribution of
 Central Limit Theorem n = 100

n = 30
Properties of a Sampling Distribution

 For any population,


 the average value of all possible sample means computed from
all possible random samples of a given size from the population is
equal to the population mean:
Considered an
μx  μ Theorem 1
“unbiased” estimator

 The standard deviation of the possible sample means computed


from all random samples of size n is equal to the population
standard deviation divided by the square root of the sample size:

σ
Also called the σx 
standard error n Theorem 2
z-value for Sampling Distribution of x

 z-value for the sampling distribution of x:


(x  μ)
z
σ
n
where: x = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Interval Estimation
 Interval Estimation of a Population Mean:
Large-Sample Case
 Interval Estimation of a Population Mean:
Small-Sample Case
 Determining the Sample Size
 Interval Estimation of a Population Proportion

x

[--------------------- x ---------------------]
[--------------------- x ---------------------]
[--------------------- x ---------------------]
Interval Estimate of a Population Mean:

 With  Known

x  z /2
n
where: 1 - is the confidence coefficient

 With  Unknown
The sample standard deviation, s, is used as the point
estimate of the population standard deviation.

s
x  t / 2;df
n
Summary of Test Statistics to be Used in a
Hypothesis Test about a Population Mean
Yes No
n > 30 ?

No
 known ?
Yes Popul.
approx.
Yes normal
Use s to
?
estimate  No
 known ?
No
Yes Use s to
estimate 

 s  s
x  z /2 x  t /2 x  z /2 x  t /2
Increase n
to > 30
n n n n
Hypothesis Testing
 Developing Null and Alternative Hypotheses
 Type I and Type II Errors
 One-Tailed Tests About a Population Mean:
Large-Sample Case
 Two-Tailed Tests About a Population Mean:
Large-Sample Case
 Tests About a Population Mean:
Small-Sample Case
Developing Null and Alternative Hypotheses

 Hypothesis testing is similar to a criminal trial.


The hypotheses are:
H0: The defendant is innocent
Ha: The defendant is guilty
A Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
 The equality part of the hypotheses always appears in
the null hypothesis.
 In general, a hypothesis test about the value of a
population mean  must take one of the following three
forms (where 0 is the hypothesized value of the
population mean).
H0:  > 0 H0:  < 0 H0:  = 0
Ha:  < 0 Ha:  > 0 Ha:   0
Level of Significance and the Rejection Region

Level of significance = 

Lower tail test Upper tail test Two tailed test


Example: Example: Example:
H0: μ ≥ 3 H0: μ ≤ 3 H0: μ = 3
HA: μ < 3 HA: μ > 3 HA: μ ≠ 3

  /2 /2
-zα 0 0 zα -zα/2 0 zα/2

Do not Do not Do not


Reject H0 Reject H0 Reject H0 Reject H0
reject H0 reject H0 reject H0
Example: Metro EMS
 Type I and Type II Errors

Population Condition

H0 True Ha True
Conclusion (  ) ( )

Fail to reject H0 Correct Type II


(Conclude   Conclusion Error

Reject H0 Type I Correct


(Conclude  rror Conclusion
Confidence Interval Approach to a
Two-Tailed Test about a Population Mean
 Select a simple random sample from the population and
use the value of the sample mean x to develop the
confidence interval for the population mean .
 If the confidence interval contains the hypothesized
value , do not reject H0. Otherwise, reject H0.

The Use of p-Values

 Reject H0 if the p-value  .


Introduction to Linear Regression and
Correlation Analysis

52
Calculating the Correlation Coefficient
Sample correlation coefficient:

r
 ( x  x)( y  y)
[ ( x  x ) ][  ( y  y ) ]
2 2

or the algebraic equivalent:


n xy   x  y
r
[n( x 2 )  ( x )2 ][n( y 2 )  ( y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Population Linear Regression

The population regression model:

Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

y  β0  β1x  ε
Variable

Linear component Random Error


component
Regression Using Excel
 Data / Data Analysis / Regression
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price  98.24833  0.10977 (square feet)
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

You might also like