Pre-Tutor MSM 2018 PDF

BUSINESS STATISTICS
BY: LAI VAN TAI

Decision making
 Yes or no / your decision?
 Setting up a new business or put money in the bank?
 Should we increase the salary of our employee?
 Should we increase the budget for marketing activities?
 Should we invest the new machine for the assemble
line?
 What should we do to increase the sale rate? Why?
2
Decision making
 How to convince people that your decision is the

best one?
3
Decision making
 What is the best choice?
Market conditions
High demand Average demand Low demand
Small Factory 200 100 -20
Average Factory 350 120 -150
Large factory 600 150 -300
4
Data Types
Data
Qualitative Quantitative
(Categorical) (Numerical)
Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
(Defined categories) Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured
characteristics)
5
Data, Data Sets, Elements,
Variables, and Observations
Variables
Stock Annual Earn/
Company Exchange Sales($M) Sh.($)
Dataram AMEX 73.10 0.86
EnergySouth OTC 74.00 1.67
Keystone NYSE 365.70 0.86
LandCare NYSE 111.40 0.33
Psychemedics AMEX 17.60 0.13
Elements Data Set Datum
6
Data Types
Sales (in $1000’s)

2003 2004 2005 2006 Time
Series
Atlanta 435 460 475 490 Data
Boston 320 345 375 395
Cleveland 405 390 410 395
Denver 260 270 285 280
Cross Sectional
Data
1-7
DATA TYPE AND LEVELS
Data timing Time series Cross sectional
Data type Qualitative Quantitative
Data levels Nominal Ordinal Interval Ratio
8
Populations and Samples
 A Population is the set of all items or individuals of

interest
 Examples: All likely voters in the next election
All parts produced today
All sales receipts for November
 A Sample is a subset of the population

 Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit
9
Population vs. Sample
Population Sample
b c
a b cd
gi n
ef gh i jk l m n
o r u
o p q rs t u v w
y
x y z
Inferential Procedures
 Making statements about a population by
examining sample results
Sample statistics Population parameters
(known) Inference
unknown, but can
be estimated from
sample evidence
Sample Population
Goal: Convert data into meaningful information!

11
Descriptive Procedures
 Collect data
 e.g., Survey, Observation,
Experiments
 Present data
 e.g., Charts and graphs
 Characterize data
 e.g., Sample mean =  x i
12
Example: Hudson Auto Repair
The manager of Hudson Auto would like to have

a better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
13
 Tabular Summary (Frequencies and Percent Frequencies)
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100
14
 Graphical
18 Summary (Histogram)
16
14
Frequency
12
10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)
3
-
1
6
Shape of a Distribution
 Describes how data is distributed
 Symmetric or skewed
 The greater the difference between the mean and the median,
the more skewed the distribution
Left-Skewed Symmetric Right-Skewed
Mean < Median Mean = Median Median < Mean

(Longer tail extends to left) (Longer tail extends to right)
Copyright ©2011 Pearson
Education, Inc. publishing as
Empirical Rule
For data having a bell-shaped distribution:
 Approximately 68% of the data values will be

within one standard deviation of the mean.
 Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 .62 62
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
18
 Ogive with Cumulative Percent Frequencies
100
Cumulative Percent Frequency
80
60
40
20
Parts
Cost ($)
50 60 70 80 90 100 110
Descriptive Statistics: Numerical Methods
 Measures of Location
 Measures of Variability
 Measures of Relative Location and Detecting Outliers
x
20
Measures of Location
 Mean
 Median
 Mode
 Percentiles
 Quartiles
21
Example: Apartment Rents
 Mean
x
x i

34,356
 490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
 Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
 Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
 90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
 Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Measures of Variability
 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation
Variance
 The variance is the average of the squared differences
between each data value and the mean.
 If the data set is a sample, the variance is denoted by s2.
 i
( x  x ) 2
s2 
n 1
 If the data set is a population, the variance is denoted
by  2.
 ( xi   ) 2
 
2
N
Variance


( xi  x ) 2
2
s   2 , 9 9 6.4 7
n 1
 Standard Deviation
s  s2  2996. 47  54. 74
 Coefficient of Variation
s 54. 74
 100   100  11.15
x 490.80
Variation
 Measures of variation give information on

the spread or variability of the data values.
 Smaller value
 Less variation
 Larger value
 More variation
Same center,
different variation
3-30
3
-
3
Constructing the
1
Box and Whisker Plot
* *
Outliers Lower 1st Median 3rd Upper
Limit Quartile Quartile Limit
The lower limit is The upper limit is

Q1 – 1.5 (Q3 – Q1) Q3 + 1.5 (Q3 – Q1)
 The center box extends from Q1 to Q3

 The line within the box is the median
 The whiskers extend to the smallest and largest values within
the calculated limits
 Outliers are plotted outside the calculated limits
 Box Plot
Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
There are no outliers.
375 400 425 450 475 500 525 550 575 600 625
Introduction to Probability
 Experiments, Counting Rules, and
Assigning Probabilities
 Events and Their Probability
 Some Basic Relationships of Probability
 Conditional Probability
Assigning Probabilities
 Classical Method
Assigning probabilities based on the assumption of
equally likely outcomes.
 Relative Frequency Method
Assigning probabilities based on experimentation or
historical data.
 Subjective Method
Assigning probabilities based on the assignor’s
judgment.
Classical Method
If an experiment has n possible outcomes, this method

would assign a probability of 1/n to each outcome.
 Example
Experiment: Rolling a die
Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a 1/6 chance of
occurring.
Relative Frequency Method
The probability assignments are given by dividing

the number-of-days frequencies by the total frequency
(total number of days).
Number of Number
Polishers Rented of Days Probability
0 4 .10 = 4/40
1 6 .15 = 6/40
2 18 .45 etc.
3 10 .25
4 2 .05
40 1.00
Counting Rule for Combinations
Another useful counting rule enables us to count the
number of experimental outcomes when n objects are to
be selected from a set of N objects.
 Number of combinations of N objects taken n at a time
 N N!
CnN   
 n  n !(N  n )!
where N! = N(N - 1)(N - 2) . . . (2)(1)

n! = n(n - 1)( n - 2) . . . (2)(1)
0! = 1
Example: Bradley Investments
 Tree Diagram
Markley Oil Collins Mining Experimental
(Stage 1) (Stage 2) Outcomes
Gain 8 (10, 8) Gain $18,000
(10, -2) Gain $8,000
Lose 2
Gain 10 Gain 8 (5, 8) Gain $13,000
Lose 2 (5, -2) Gain $3,000

Gain 5
Gain 8
(0, 8) Gain $8,000
Even
Lose 2 (0, -2) Lose $2,000
Lose 20 Gain 8
(-20, 8) Lose $12,000
Lose 2 (-20, -2) Lose $22,000
Hypergeometric Distribution Formula
(Two possible outcomes per trial: success or failure)
N X X
C . C
P( x )  n x
N
x
C n
Where
N = population size
X = number of successes in the population
n = sample size
x = number of successes in the sample
n – x = number of failures in the sample
5-39
Sampling and Sampling Distributions
 Simple Random Sampling

 Point Estimation
 Sampling Distribution of
 Central Limit Theorem n = 100
n = 30
Properties of a Sampling Distribution
 For any population,

 the average value of all possible sample means computed from
all possible random samples of a given size from the population is
equal to the population mean:
Considered an
μx  μ Theorem 1
“unbiased” estimator
 The standard deviation of the possible sample means computed

from all random samples of size n is equal to the population
standard deviation divided by the square root of the sample size:
σ
Also called the σx 
standard error n Theorem 2
z-value for Sampling Distribution of x
 z-value for the sampling distribution of x:

(x  μ)
z
σ
n
where: x = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Interval Estimation
 Interval Estimation of a Population Mean:
Large-Sample Case
 Interval Estimation of a Population Mean:
Small-Sample Case
 Determining the Sample Size
 Interval Estimation of a Population Proportion
x

[--------------------- x ---------------------]
[--------------------- x ---------------------]
[--------------------- x ---------------------]
Interval Estimate of a Population Mean:
 With  Known

x  z /2
n
where: 1 - is the confidence coefficient
 With  Unknown
The sample standard deviation, s, is used as the point
estimate of the population standard deviation.
s
x  t / 2;df
n
Summary of Test Statistics to be Used in a
Hypothesis Test about a Population Mean
Yes No
n > 30 ?
No
 known ?
Yes Popul.
approx.
Yes normal
Use s to
?
estimate  No
 known ?
No
Yes Use s to
estimate 
 s  s
x  z /2 x  t /2 x  z /2 x  t /2
Increase n
to > 30
n n n n
Hypothesis Testing
 Developing Null and Alternative Hypotheses
 Type I and Type II Errors
 One-Tailed Tests About a Population Mean:
Large-Sample Case
 Two-Tailed Tests About a Population Mean:
Large-Sample Case
 Tests About a Population Mean:
Small-Sample Case
Developing Null and Alternative Hypotheses
 Hypothesis testing is similar to a criminal trial.

The hypotheses are:
H0: The defendant is innocent
Ha: The defendant is guilty
A Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
 The equality part of the hypotheses always appears in
the null hypothesis.
 In general, a hypothesis test about the value of a
population mean  must take one of the following three
forms (where 0 is the hypothesized value of the
population mean).
H0:  > 0 H0:  < 0 H0:  = 0
Ha:  < 0 Ha:  > 0 Ha:   0
Level of Significance and the Rejection Region
Level of significance = 
Lower tail test Upper tail test Two tailed test

Example: Example: Example:
H0: μ ≥ 3 H0: μ ≤ 3 H0: μ = 3
HA: μ < 3 HA: μ > 3 HA: μ ≠ 3
  /2 /2
-zα 0 0 zα -zα/2 0 zα/2
Do not Do not Do not

Reject H0 Reject H0 Reject H0 Reject H0
reject H0 reject H0 reject H0
Example: Metro EMS
 Type I and Type II Errors
Population Condition
H0 True Ha True
Conclusion (  ) ( )
Fail to reject H0 Correct Type II

(Conclude   Conclusion Error
Reject H0 Type I Correct

(Conclude  rror Conclusion
Confidence Interval Approach to a
Two-Tailed Test about a Population Mean
 Select a simple random sample from the population and
use the value of the sample mean x to develop the
confidence interval for the population mean .
 If the confidence interval contains the hypothesized
value , do not reject H0. Otherwise, reject H0.
The Use of p-Values
 Reject H0 if the p-value  .

Introduction to Linear Regression and
Correlation Analysis
52
Calculating the Correlation Coefficient
Sample correlation coefficient:
r
 ( x  x)( y  y)
[ ( x  x ) ][  ( y  y ) ]
2 2
or the algebraic equivalent:

n xy   x  y
r
[n( x 2 )  ( x )2 ][n( y 2 )  ( y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Population Linear Regression
The population regression model:
Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual
y  β0  β1x  ε
Variable
Linear component Random Error

component
Regression Using Excel
 Data / Data Analysis / Regression
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price  98.24833  0.10977 (square feet)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Pre-Tutor MSM 2018 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pre-Tutor MSM 2018 PDF

Uploaded by

Copyright:

Available Formats

BUSINESS STATISTICS

BY: LAI VAN TAI

 How to convince people that your decision is the

Small Factory 200 100 -20

Average Factory 350 120 -150

Large factory 600 150 -300

Elements Data Set Datum

Sales (in $1000’s)

Data timing Time series Cross sectional

Data type Qualitative Quantitative

Data levels Nominal Ordinal Interval Ratio

 A Population is the set of all items or individuals of

 A Sample is a subset of the population

Goal: Convert data into meaningful information!

The manager of Hudson Auto would like to have

Left-Skewed Symmetric Right-Skewed

Mean < Median Mean = Median Median < Mean

For data having a bell-shaped distribution:

 Approximately 68% of the data values will be

 Measures of variation give information on

Box and Whisker Plot

The lower limit is The upper limit is

 The center box extends from Q1 to Q3

If an experiment has n possible outcomes, this method

The probability assignments are given by dividing

where N! = N(N - 1)(N - 2) . . . (2)(1)

Lose 2 (5, -2) Gain $3,000

(Two possible outcomes per trial: success or failure)

 Simple Random Sampling

 For any population,

 The standard deviation of the possible sample means computed

 z-value for the sampling distribution of x:

 Hypothesis testing is similar to a criminal trial.

Lower tail test Upper tail test Two tailed test

Do not Do not Do not

Fail to reject H0 Correct Type II

Reject H0 Type I Correct

The Use of p-Values

 Reject H0 if the p-value  .

or the algebraic equivalent:

The population regression model:

Linear component Random Error

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

You might also like