You are on page 1of 77

Business Statistics

Class Discussions
(Numerical Questions)
Examples of Arithmetic Mean
The following table shows monthly sales of 10 The number of LED lamps used in households is
stores of a retail chain. Calculate the average given below. Calculate the average number of LED
monthly sales. lamps used.
(Ungrouped) (Grouped, discrete)
Sales Average Sales No. of LED No. of
Store Average LED
($ 1000s) calculation lamps households f*X
calculation
A 22 (X) (f)
B 25 Average monthly sales Average no.
1 2 2*1=2
C 27 = of LED
(22+25+27+29+30+31 lamps used
D 29 2 4 4*2=8
+32+33+35+36)/10 = ΣfX/Σf
E 30 = ΣX/n = 60/20
F 31 = 300/10 3 6 6*3=18 =3
= 30 The average
G 32
The average monthly 4 8 8*4=32 usage of
H 33 sales is $30,000 per LED is 3
I 35 store. lamps per
Total Σf = 20 ΣfX = 60
J 36 household.
The following table provides data on the distribution of salaries ($1000s)
of 50 employees of an organization. Calculate the average salary.

(Grouped, continuous)
No. of
Salary Average Salary
Employees Mid-value (X) f*X
($1000s) calculation
(f)
10-20 2 15 2*15=30
20-30 4 25 4*25=100 Average salary
30-40 6 35 6*35=210 of employees
= ΣfX/Σf
40-50 8 45 8*45=360 = 2750/50
50-60 10 55 10*55=550 = 55

60-70 8 65 8*65=520 The average


70-80 6 75 6*75=450 salary of the
employees is
80-90 4 85 4*85=340 $55,000.
90-100 2 95 2*95=190
Total Σf = 50 ΣfX = 2750
Practice Problems:

1. The mean weights of five computer stations is 167.2 lbs. The weights of four of
them are 158.4 lbs, 162.8 lbs, 165.0 lbs and 178.2 lbs respectively. What is the
weight of the fifth computer?
2. The following table gives the weights of wooden items being sold by a timber
merchant. Calculate mean weight of the items sold.
Weight (lbs) 1-3 4-6 7-9 10-12 13-15
No. of items 8 25 45 18 4

3. An ice-cream parlor sells six varieties of ice-creams which have generated the
following revenue. Find the mean price of an ice-cream sold.
Ice-cream Butter scotch Chocolate Lychee Choco chips Tooty fruity Vanilla
Price (Rs.) 40 90 65 55 75 45
Sales (Rs.) 5,00,000 4,50,000 3,38,000 3,01,180 4,93,800 3,14,415
Weighted Arithmetic Mean
The weighted mean is a type of mean that is calculated by multiplying the weight (or
probability) associated with a particular event or outcome with its associated quantitative
outcome and then summing all the products together.
Weighted Mean = ΣWiXi/ΣWi; i = 1,2,3,……,n.
Weight
Examination Score (X) W*X Weighted Mean
(W)

Test-1 0.15 65% 0.15*65=9.75

Test-2 0.15 85% 0.15*85=12.75 Weighted


Arithmetic Mean
CP-1 0.05 90% 0.05*90=4.50 = ΣWX/ΣW
= 72.50/1.00
CP-2 0.05 90% 0.05*90=4.50
= 72.50
Mid-term 0.20 75% 0.20*75=15.00 The weighted
average score is
End-term 0.40 65% 0.40*65=26.00 72.50%.

Total ΣW=1.00 ΣWX= 72.50


Decisions based on Weighted Mean:
Example: Sam wants to buy a new camera, and decides on the following rating system:
Image Quality 50%
Battery Life 30%
Zoom Range 20%
The brand ‘X’ camera gets 8 for Image Quality, 6 for Battery Life and 7 for Zoom Range, all out
of 10.
The brand ‘Y’ camera gets 9 for Image Quality, 4 for Battery Life and 6 for Zoom Range, all out
of 10.

Which camera will Sam buy?


Weighted score of X : (0.5 × 8) + (0.3 × 6) + (0.2 × 7) = 4 + 1.8 + 1.4 = 7.2
Weighted score of Y: (0.5 × 9) + (0.3 × 4) + (0.2 × 6) = 4.5 + 1.2 + 1.2 = 6.9

Sam decides to buy the brand ‘X’.


Examples of Median (Ungrouped data):
1. The class size of five sections of first year students are 32, 56, 42, 46, 48 respectively.
Find the median no. of students.
Key: Arrange the nos. in ascending order: 32, 42, 46, 48, 56
No. of observations n = 5 (odd)
Median value = [(n+1)/2]th observation
= [(5+1)/2]th observation
= 3rd observation = 46
The median no. of students is 46.

2. A batsman scored 1, 113, 148, 22, 24, 27, 15, 16, 16 & 28 runs in the last 10 innings. Using an
appropriate measure, find his average score.
Key: Since there are 2 extreme scores 113 & 148, hence mean would be affected by these values.
Here, median would be an appropriate measure.
Arrangement: 1, 15, 16, 16, 22, 24, 27, 28, 113, 148.
No. of observations, n = 10 (even)
Median = Mean of (n/2)th and (n/2+1)th observations
= Mean of 5th and 6th observations
= (22+24)/2 = 23
The average score of the batsman is 23 runs.
Example of Median (Grouped data, discrete series):
Calculate median marks obtained by the students:
Marks : 45 55 25 35 5 15
No. of students : 40 30 30 50 10 20
Key: Arrange the marks in ascending/descending order, then find CF.
Marks No. of students Cumulative Finding Median
(X) (f) frequency (CF) value
5 10 10 N/2 = 90
15 20 30 Since 90 is not
available in CF
25 30 60 column, we
35 50 110 take the next
value 110.
45 40 150 Hence the
55 30 180 (N) median marks
is 35.
Example: Grouped data, continuous series
Mode
• Mode is the value which occurs most frequently in a distribution.
• A distribution can have one or more than one modes.
• Mode is widely used while compiling the results of surveys. The options with
maximum frequencies are considered and decisions are taken accordingly.
• Mode can be calculated for grouped, ungrouped, discrete and continuous data.
• Mode for grouped and continuous series is calculated using the formula:
Discrete series Continuous series
Practice Problems
Range
• Range is defined as the difference between largest observation (L) and smallest
observation (S) in a distribution. Range = L-S
• If the average of two distributions are almost same, then the distribution with
smaller range is said to have less dispersion.
• Lesser value of range indicates more consistency in the distribution.
• Coefficient of range = (L-S)/(L+S)
• Range is widely used for statistical quality control. If the dimensions of products
are beyond a defined range, they are discarded.
• It facilitates the study of variations in the prices of shares, agricultural products
and other commodities.
• It also helps in weather forecasts by indicating minimum and maximum
temperature.
Quartile Deviation
• Quartile deviation or semi inter quartile range is half of the difference between upper
quartile (Q3) and lower quartile (Q1).
• Quartile deviation (QD) = (Q3 - Q1)/2, and Coefficient of QD = (Q3 - Q1)/(Q3 + Q1)
• The Quartile Deviation doesn’t take into consideration the extreme points of the
distribution. Thus, the dispersion or the spread of only the central 50% data is considered.
• It is the best measure of dispersion for open-ended systems (which have open-ended
extreme ranges).
• Q1 = [(n+1)/4]th observation, and Q3 = [3(n+1)/4]th observation
• Ex: 15, 16, 17, 18, 19, 20, 22, 26, 27, 30, 31, 34, 41, 45, 47, 52, 53, 56, 67, 68, 69, 72, 74,
76
Q1 = {(24+1)/4}th obs. = 6.25th obs. = 7th obs. = 22
Q3 = {3(24+1)/4}th obs. = 18.75th obs. = 19th obs. = 67
QD = (67 – 22)/2 = 22.5 and
Coeff. of QD = (67 – 22)/(67 + 22) =
Mean Deviation
• Mean deviation is arithmetic mean of the absolute deviations of all items from a
measure of central tendency.
• Mean deviation (MD) =
where ‘m(X)’ is any central tendency value and ‘n’ is no. of observations.
Example: Calculate Mean X |X-m(X)| Mean Deviation

deviation about mean: 5 |5-9|= 4

7 |7-9|= 2

8 |8-9|= 1

9 |9-9|= 0 MD about mean


= Σ|X-m(X)|/n
10 |10-9|= 1 = 14/7
=2
11 |11-9|= 2

13 |13-9|= 4

Mean
Σ|X-m(X)|= 14
m(X) = 9
Variance, Standard Deviation
2
x (x-𝑥)ҧ 𝑥 − 𝑥ҧ Variance Std. Deviation
8 8-7=1 1
4 4-7=-3 9
9 9-7=2 4 Sample variance (s2) Sample Std. Deviation (s)
11 11-7=4 16 = Σ 𝑥 − 𝑥ҧ 2 /(n-1) = Sqrt (Sample variance)
= 46/4 = Sqrt (11.5)
3 3-7=-4 16 = 11.5 = 3.391
Σx=35 2
Σ 𝑥 − 𝑥ҧ =
𝑥ҧ = 7
46

2 2
x f fx (x-𝑥)ҧ 𝑥 − 𝑥ҧ 𝑓 𝑥 − 𝑥ҧ Variance Std. Deviation
8 3 24 8-8=0 0 3*0=0
4 4 16 4-8=-4 16 4*16=64
10 6 60 10-8=2 4 6*4=24 Sample Variance (s2) Std Deviation (𝑠)
12 4 48 12-8=4 16 4*16=64 = Σf 𝑥 − 𝑥ҧ 2 /(Σf -1) = Sqrt (Sample variance)
= 200/19 = Sqrt (10.526)
4 3 12 4-8=-4 16 3*16=48 = 10.526 = 3.244
2
Σfx = 160; Σf=20; Σ𝑓 𝑥 − 𝑥ҧ
𝑥ҧ = 160/20 = 8 = 200
Example: Grouped, continuous series
A showroom of cars displays its sales figures for the last 30 days. Calculate the mean no. of
cars sold per day and std. deviation.

No. of
Days 2 2
cars x fx (x-𝑥)ҧ 𝑥 − 𝑥ҧ 𝑓 𝑥 − 𝑥ҧ Variance Std. Deviation
(f)
sold
0-2 14 1 14 1-3=-2 4 14*4=56
2-4 7 3 21 3-3=0 0 7*0=0
4-6 5 5 25 5-3=2 4 5*4=20 Sample Standard
Sample Variance (s2)
Deviation (s)
6-8 3 7 21 7-3=4 16 3*16=48 = Σf 𝑥 − 𝑥ҧ 2 /(Σf-1)
= Sqrt (Variance)
= 160/29
8-10 1 9 9 9-3=6 36 1*36=36 = Sqrt (5.517)
= 5.517
= 2.349
2
Σfx = 90; Σf = 30 Σ𝑓 𝑥 − 𝑥ҧ
𝑥ҧ = 90/30 = 3 = 160
Coefficient of Variation
CV is used to study consistency whenever there is a comparison between two or
more datasets.
Two companies Dawson Suppliers and Clark Distributors deliver construction materials. The following data shows
days of delivery for both the companies on 8 occasions. Which company is more consistent in deliveries?

Dawson Clark 𝟐 𝟐

𝒙−𝒙 ഥ
𝒙−𝒙 ഥ
𝒚−𝒚 ഥ
𝒚−𝒚 Coefficient of Variation
(x) (y)
11 8 1 1 -2 4
10 10 0 0 0 0
9 17 -1 1 7 49
10 7 0 0 -3 9 SD of x = Sqrt (26/7) = 1.92
8 10 -2 4 0 0 CV of x = (1.92/10)*100 = 19.2%
8 11 -2 4 1 1
SD of y = Sqrt (72/7)= 3.21
10 10 0 0 0 0 CV of y = (3.21/10)*100 = 32.1%
14 7 4 16 -3 9
Dawson Suppliers is more consistent in
2 2 delivering the materials.
Mean Mean Σ 𝑥 − 𝑥ҧ Σ 𝑦 − 𝑦ത
𝑥ҧ = 10 𝑦ത = 10 = 26 = 72
Practice Problems
1. Find quartile deviation from the following data:
109, 189, 167, 209, 309, 265, 189, 187, 165, 239, 308, 378, 367, 109, 198, 209, 218, 387
2. The share prices of two companies X and Y are given below for twelve days. Which
company’s share prices are more consistent?
Days 1 2 3 4 5 6 7 8 9 10 11 12
X 201 200 199 203 206 208 206 201 197 199 198 196
Y 291 293 293 287 292 298 298 299 302 302 302 304

3. Calculate standard deviation from the following distribution:


Class 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 5 10 15 15 10 5
Outlier Detection
Outliers are extreme values that deviate from other observations on data, they may
indicate a variability in a measurement, experimental errors or a uniqueness. In other
words, an outlier is an observation that diverges from an overall pattern on a sample.
Most common causes of outliers on a data set:
• Data entry errors (human errors)
• Measurement errors (instrument errors)
• Experimental errors (data extraction or experiment planning/executing errors)
• Intentional (dummy outliers made to test detection methods)
• Data processing errors (data manipulation or data set unintended mutations)
• Sampling errors (extracting or mixing data from wrong or various sources)
• Natural (not an error, novelties in data)
‘Z-Score’ method:
The ‘z-score’ or ‘standard score’ of an observation is a metric that indicates how many standard
deviations a data point is from the sample mean. After making the appropriate transformations to the
selected feature space of the dataset, the z-score of any data point can be calculated with the following
expression:

When computing the z-score for each sample on the data set a threshold must be specified. A general
‘thumb-rule’ for detecting outliers is lZl > 3.0, however it varies. Sometimes lZl values more than 2.5 are
also considered as outliers.
Meal price (x) ($) Z-score = (x-mean)/s.d.

18 -0.3545

19 -0.3151

20 -0.2757

17 -0.3939

21 -0.2363

22 -0.1969

99 2.8358 (Outlier)

21 -0.2363

15 -0.4726

18 -0.3545
Mean=27
S.D.=25.3859
‘Box-plot’ or ‘IQR’ method:
Interquartile range (IQR) is a measure of variability and also referred to as ‘midspread’. It is calculated as:
IQR = Q3 – Q1
With the help of IQR, we determine lower limit and upper limit for the dataset. Any value less than lower limit or
more than upper limit is considered as an outlier.
Lower limit = Q1 – 1.5*IQR
Upper limit = Q3 + 1.5*IQR
Meal price ($) Calculations
15 IQR = Q3 – Q1
17 Q1 = 18 and Q3 = 21
IQR = 3
18 Lower limit = 18 – 1.5*3 = 13.5
18 Upper limit = 21 + 1.5*3 = 25.5
19 Any value less than $13.5 and more than
20 $25.5 will be considered as outlier.
Hence the meal price of $99 is an outlier.
21
21
22
99
The data for ‘Stereo and Sound Equipment Store’ is given pertaining to the no. of television commercials shown and sales during
10 weeks. The manager of store wants to determine the association between television commercials and sales revenue. Calculate
sample covariance and interpret the result.
No. of
Week Commerci Sales ($100s) (x-ഥ
𝒙) (𝒚 − 𝒚
ഥ) (x-ഥ
𝒙)(𝒚 − 𝒚
ഥ)
als (x) (y)
Cov(X,Y) = Σ(x-𝑥)(𝑦
ҧ − 𝑦)/(n-1)

1 2 50 -1 -1 1
= 99/(10-1)
2 5 57 2 6 12 = 11

3 1 41 -2 -10 20 From the data, it is evident that there


exists a positive association between
4 3 54 0 3 0 television commercials and sales of
the store, i.e., sales increases with the
5 4 54 1 3 3 increase in television commercials.
But this analysis only gives direction
6 1 38 -2 -13 26 of the relationship, not the strength.
Strength or degree of relationship
7 5 63 2 12 24 can be understood with correlation
analysis.
8 3 48 0 -3 0

9 4 59 1 8 8

10 2 46 -1 -5 5

Σx=30 Σy=510 Σ(x-𝑥)(𝑦


ҧ − 𝑦)

𝒙=3
ഥ ഥ = 51
𝒚 = 99
The sample data for ‘Stereo and Sound Equipment Store’ is given pertaining to the no. of television commercials shown and sales during 10
weeks. The manager of store wants to determine the association between television commercials and sales revenue. Calculate Karl
Pearson’s coefficient of correlation and interpret the result.

No. of Sales Cov(X,Y)= Σ(x-𝑥)(𝑦 ҧ − 𝑦)/(n-1)



Week Commercials ($100s) (x-ഥ
𝒙) (𝒚 − 𝒚
ഥ) (x-ഥ
𝒙)2 (𝒚 − ഥ)2
𝒚 (x-ഥ
𝒙)(𝒚 − 𝒚
ഥ) = 99/(10-1)
(x) (y) = 11
Sx= Sqrt [Σ(x-𝑥)ҧ 2/(n-1)]
1 2 50 -1 -1 1 1 1
= Sqrt (20/9)
2 5 57 2 6 4 36 12 = Sqrt (2.22) = 1.49
3 1 41 -2 -10 4 100 20 Sy= Sqrt [Σ(𝑦 − 𝑦)
ത 2/(n-1)]
= Sqrt (566/9)
4 3 54 0 3 0 9 0
= Sqrt (62.89) = 7.93
5 4 54 1 3 1 9 3
rxy = Cov(X,Y)/SxSy ……. 1
6 1 38 -2 -13 4 169 26 = 11/(1.49)(7.93)
= 0.931
7 5 63 2 12 4 144 24
rxy = 99/sqrt(20*566) ……. 2
8 3 48 0 -3 0 9 0
= 99/sqrt(11320)
= 99/106.39 = 0.931
9 4 59 1 8 1 64 8

10 2 46 -1 -5 1 25 5 There exists a very strong


positive correlation between
Σx=30 Σy=510 Σ(x-𝑥)ҧ 2 Σ(𝑦 − 𝑦)
ത 2 Σ(x-𝑥)(𝑦
ҧ − 𝑦)
ത television commercials and
𝒙=3
ഥ ഥ = 51
𝒚 = 20 = 566 = 99 sales of the store.
Karl Pearson’s Coefficient of Correlation (r)
………. 3

where dx = X-A & dy = Y-B; A & B are assumed means from X & Y series respectively.

Ex: Find the Karl Pearson’s Coefficient of Correlation between X and Y from the following data:
X Y dx = X-16 dy = Y-27 (dx)2 (dy)2 dx.dy
10 19 -6 -8 36 64 48
12 22 -4 -5 16 25 20
13 26 -3 -1 9 1 3
16 27 0 0 0 0 0
17 29 1 2 1 4 2
20 33 4 6 16 36 24
25 37 9 10 81 100 90
∑X = 113 ∑Y = 193 ∑dx = 1 ∑dy = 4 ∑(dx)2 = 159 ∑(dy)2 = 230 ∑dx.dy = 187
7x187−1x4
r=
7x159 − 1x1 { 7x230 −(4x4)}

1309−4
=
(1113−1)(1610−16)
1305
=
(1112)(1594)

1305
=
1330.9
r = 0.986
There is very strong direct (or strong positive)
correlation between age and weight.
Practice Exercises
1. The sample gives data collected by department of transportation on driving speed (miles
per hour) and fuel economy (mileage per gallon) of mid-sized automobiles. Is there any
correlation between the two variables? Interpret your result.
Speed 30 50 40 55 30 25 60 25 50 55
Economy 28 25 25 23 30 32 21 35 26 25

2. A leading TV manufacturing company was trying to establish if there was any correlation
between family size and TV screen size from the following data:
Family size 2 3 4 3 2 4 2 4
TV size 32 28 40 48 32 28 28 28

Calculate the Karl Pearson’s coefficient of correlation.


3. The sales and advt. expenses (both in $ millions) of XYZ company is given below. Compute
the Karl Pearson’s coefficient of correlation and interpret your results.
Sales 8 9 10 12 15 18 16 20 21 25 27 30 35 38 40
Advt. 2 2.5 3 3.2 4 4.5 5 6 7 8 8.5 9 9.2 9.5 10
Q1. Q3.
rxy = -0.9103 rxy = 0.9692

Speed Economy Sales Advt.


XY X² Y²
(X) (Y)
(x) (y) 8 2 16 64 4
30 28 -12 1 -12 144 1 9 2.5 22.5 81 6.25
50 25 8 -2 -16 64 4 10 3 30 100 9
12 3.2 38.4 144 10.24
40 25 -2 -2 4 4 4 15 4 60 225 16
55 23 13 -4 -52 169 16 18 4.5 81 324 20.25
16 5 80 256 25
30 30 -12 3 -36 144 9 20 6 120 400 36
25 32 -17 5 -85 289 25 21 7 147 441 49
60 21 18 -6 -108 324 36 25 8 200 625 64
27 8.5 229.5 729 72.25
25 35 -17 8 -136 289 64 30 9 270 900 81
50 26 8 -1 -8 64 1 35 9.2 322 1225 84.64
38 9.5 361 1444 90.25
55 25 13 -2 -26 169 4
40 10 400 1600 100
-475 1660 164 324 91.4 2377.4 8558 667.88
The sample data for ‘Stereo and Sound Equipment Store’ is given pertaining to the no. of television commercials shown and sales during
10 weeks. The manager wants to estimate the sales if no. of TV commercials are increased up to 10 per week. Can you predict the sales
($ 100s) using regression analysis?
No. of Sales Let us assume the regression
Week Commercials ($100s) (x-ഥ
𝒙) (𝒚 − 𝒚
ഥ) (x-ഥ
𝒙)2 (x-ഥ
𝒙)(𝒚 − 𝒚
ഥ) equation as
(x) (y) y = b0 + b1x
1 2 50 -1 -1 1 1 where, y represents sales and x
represents no. of TV commercials.
2 5 57 2 6 4 12
3 1 41 -2 -10 4 20 b1= Σ(x-𝑥)(𝑦
ҧ − 𝑦)/Σ(x-
ത 𝑥)ҧ 2 &
b0 = 𝑦ത - b1𝑥ҧ
4 3 54 0 3 0 0
Calculations:
5 4 54 1 3 1 3
b1= 99/20 = 4.95 (reg. coeff.)
b0 = 51 – (4.95)*3 = 36.15 (intercept)
6 1 38 -2 -13 4 26
7 5 63 2 12 4 24 Hence, the regression equation is
y = 36.15 + 4.95x
8 3 48 0 -3 0 0
If the no. of TV commercials
9 4 59 1 8 1 8
x = 10, then
10 2 46 -1 -5 1 5 y = 36.15 + 4.95*10 = 85.65

Σx=30 Σy=510 Σ(x-𝑥)ҧ 2 Σ(x-𝑥)(𝑦


ҧ − 𝑦)
ത The estimated sales will be
𝒙=3
ഥ ഥ = 51
𝒚 = 20 = 99 $8565 per week.
Armand’s Pizza Parlor has 10 outlets outside ten different colleges. The student population of colleges (1000s) and quarterly sales of
outlets ($1000s) are given in the sample data. Estimate the quarterly sales of an outlet near a college having 18000 students.

Student Quarterly Let us assume the regression equation


Outlet Population sales (x-ഥ
𝒙) (𝒚 − 𝒚
ഥ) (x-ഥ
𝒙)2 (x-ഥ
𝒙)(𝒚 − 𝒚
ഥ) as
(1000s) ($1000s) y = b0 + b1x
(x) (y) where, y represents sales and x
1 2 58 -12 -72 144 864 represents student population.

2 6 105 -8 -25 64 200 b1= Σ(x-𝑥)(𝑦


ҧ − 𝑦)/Σ(x-
ത 𝑥)ҧ 2 &
3 8 88 -6 -42 36 252 b0 = 𝑦ത - b1𝑥ҧ

4 8 118 -6 -12 36 72 Calculations:


b1= 2840/568 = 5
5 12 117 -2 -13 4 26
b0 = 130 – 5*14 = 60
6 16 137 2 7 4 14
Hence, the regression equation is
7 20 157 6 27 36 162 y = 60 + 5x
8 20 169 6 39 36 234 If the no. of students in a college is
18000 then x = 18,
9 22 149 8 19 64 152 y = 60 + 5*18 = 150
10 26 202 12 72 144 864
The estimated quarterly sales will be
$150,000.
Σx=140 Σy=1300 Σ(x-𝑥)ҧ 2 Σ(x-𝑥)(𝑦
ҧ − 𝑦)

𝒙 = 14
ഥ ഥ = 130
𝒚 = 568 = 2840
The sales and advertisement expenditure (both in $ million) of XYZ Ltd. is given below. Assuming a linear relationship, estimate
the sales when advertisement expenditure is $7.5 million.
Sales Advt Exp Let us assume the regression equation as
($ million) ($ million) xy x2 y = b0 + b1x
(y) (x) where
8 2 16 4
9 2.5 22.5 6.25
10 3 30 9
12 3.2 38.4 10.24
15 4 60 16
15x2380.9)−(91.5x324
b1 =
(15x669.73) − 91.5 2
18 4.5 81 20.25
15x2380.9)−(91.5x324
16 5 80 25 = = 6067.5/1673.7
(15x669.73) − 91.5 2
20 6 120 36 b1= 3.63
21 7 147 49
b0 = 21.6 – (3.63x6.1) = 21.6 – 22.143
25 8 200 64
b0 = – 0.543
27 8.5 229.5 72.25
30 9 270 81 Regression equation is:
35 9.3 325.5 86.49 y = -0.543 + 3.63x
38 9.5 361 90.25 Estimated sales for advt. exp. x = 7.5:
40 10 400 100 y = -0.543 + (3.63x7.5) = 26.682
The estimated sales will be $26.7 million approximately.
Σy=324 Σx=91.5 Σx2 =
𝑦ത = 21.6 𝑥ҧ = 6.1 Σxy = 2380.9 669.73
Practice Exercises
Q1. The following data contains the number of years of college (YC) and the current annual income (AI) in thousand rupees
for a random sample of heavy equipment sales people:

YC 2 2 3 4 3 1 4 3 4 4
AI 20 23 25 26 28 29 27 30 33 35

a. Compute the correlation coefficient between two variables.


b. Develop an estimated regression line for AI dependent on YC.
c. Estimate the annual income of an employee having attended five years of college.

Q2. A company sells fitness equipment in various price range. Ratings of users have been provided in the following data
along with price of the equipment ($100s):
Model A B C D E F G H
Price 37 25 28 19 10 8 17 6
Rating 87 84 82 74 73 69 68 55

1. Comment on the ‘price-rating relationship’ of fitness equipment.


2. Develop a regression equation for rating based on price.
3. Predict the rating for an equipment whose price is $1500.
Classical Method of Assigning Probability
• Classical method of assigning probability is applicable when all the events are equally
likely. It is given by the formula:
Probability of an event = No. of favorable events/Total no. of possible events
If A is an event then P(A) = m/n where m = no. of favorable events, and n = total
no. of possible events.
• The sum of probabilities of all the events of an experiment is equal to 1.
• If P(A) is the probability of any event any event A, then P(𝐴)ҧ is the probability of its
complementary event and P(A) + P(𝐴)ҧ = 1 .
Example-1: In tossing of two coins, find the probability of (i) one head, (ii) at least one
head, (iii) no heads.
Solution: Sample space = {HH, HT, TH, TT}, n=4
(i) Events for getting one head = {HT, TH}, m=2; P(A) = 2/4 or 0.5
(ii) Events of getting at least one head = {HH, HT, TH}, m=3; P(A) = 3/4 or 0.75
(iii) Events of getting no heads = {TT}, m=1; P(A) = 1/4 or 0.25
Example-2: In rolling of a die, find the probability of getting (i) an even number, (ii) a
prime number, (iii) a perfect square.
Solution: Sample space = {1, 2, 3, 4, 5, 6}, n=6
Events of even no. = {2, 4, 6}, m=3; P(A) = 3/6 or 0.5
Events of prime no. = {2, 3, 5}, m=3; P(A) = 3/6 or 0.5
Events of perfect square = (1, 4}, m=2; P(A) = 2/6 or 0.33
Example-3: In rolling of two dice, find the probability that sum of both the upper faces is
(i) equal to 10, (ii) at least 10, (iii) more than 10.
Solution: Sample space = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6),………….., (6,6)}, n=36
(i) Events for getting sum of 10 = {(4,6), (6,4), (5,5)}, m=3; P(A) = 3/36 or 1/12
(ii) Events of getting at least 10 = {(4,6), (6,4), (5,5), (5,6), (6,5), (6,6)}, m=6; P(A) = 6/36 or
1/6
(iii) Events of getting more than 10 = {(5,6), (6,5), (6,6)}, m=3; P(A) = 3/36 or 1/12
Union and Intersection of events
Let us consider the sample space Ω={1,2,3,4,5,6,7,8} and its two subsets A={1,2,3,4} &
B={3,4,5,6}. Find A∪B and A∩B.
Laws of probability
(i) Additive Law: If A and B are two events, then
P(A or B) OR P(A∪B) = P(A) + P(B), if they are mutually exclusive
P(A∪B) = P(A) + P(B) – P(A∩B), if they are not mutually exclusive

Example-1: A card is drawn from a pack of 52 cards, find the probability that it is either a
queen or a heart.
Solution: Let A is the event that the card is a queen and B is the event that it is a heart.
Total no. of queens = 4 and total no. of hearts = 13, so P(A) = 4/52 and P(B) = 13/52
There is a card ‘queen of heart’ which includes both events A and B, P(A and B) = 1/52
Hence, P(A∪B)= P(A) + P(B) – P(A∩B)
= 4/52 + 13/52 – 1/52
= 16/52 or 0.3077
The required probability is 0.3077.
Example-2: In a bank, 50% of the accounts are current accounts, 30% are savings accounts
and rest are loan accounts. What is the probability that a randomly selected customer is
having either a current account or a savings account?
Solution: Let A, B and C are the events that a customer is having current, savings and loan
accounts respectively.
P(A) = 0.5, P(B) = 0.3 and P(C) = 0.2
Since the events are mutually exclusive, hence the required probability is
P(A∪B) = P(A) + P(B) = 0.5 + 0.3 = 0.8
There is 80% chance that the randomly selected customer is having either a current or a
savings account.

Question: In a class of 90 students, 50 opted Maths, 25 opted Biology, 30 opted both


Maths and Biology. Find the probability that a randomly selected student has opted either
Maths or Biology.
(ii) Multiplicative Law: If A and B are two events, then
• P(A and B) OR P(A∩B) = P(A).P(B), if they are independent
• P(A∩B) = P(A).P(B|A) if they are not independent, or
• P(A∩B) = P(B).P(A|B) if they are not independent
Example-1: A store has a Rayban, Fastrack and Armani sunglasses. It also has a Van Heusen, Allen
Solly, Louis Philippe and UCB shirts. If you choose one sunglass and one shirt at random, what is the
probability that you choose the Rayban and the UCB shirt?
Solution: Selecting a sunglass and a shirt are two independent events.
P(A) = 1/3 and P(B) = 1/4
P(A∩B) = P(A).P(B) = (1/3).(1/4) = 1/12
Example-2: A retail store has 40 packets of biscuits out of which 10 packets are of brand ‘X’. Two
customers come, one after the other and buy a packet of biscuit. What is the probability that both
of them will buy brand ‘X’ biscuits?
Solution: P(A) = 10/40 and P(B|A) = 9/39
P(A∩B) = P(A)*P(B|A) = (10/40)*(9/39) = 0.0577
Example-3: The credit card division of a bank is running a drive to make phone calls
to potential customers. The probability that an employee will answer the call is 0.40.
If two calls are made, what is the probability that:
a. At least one of the calls is answered
b. Exactly one call is answered
c. Both the calls are answered
Solution: Suppose A is the event of receiving the first call and B is the event of
receiving the second call. Given that P(A) = 0.40 and P(B) = 0.40,
a. P(A∪B) = P(A) + P(B) = 0.40+0.40 = 0.80
b. P(Exactly one call is answered) = P(A)*P(Bഥ) + P(Aഥ)*P(B)
= 0.4x0.6 + 0.6x0.4 = 0.48
c. P(A∩B) = P(A)*P(B) = 0.4x0.4 = 0.16
Practice Questions
1. In Newcastle, 70% of small businesses use the internet to advertise new
products; 50% of small businesses use flyers to advertise new products and a
quarter of small businesses use both flyers and the internet.
(a) What is the probability that a randomly chosen small business in Newcastle
uses either flyers or the internet to advertise new products?
(b) What is the proportion of small businesses in Newcastle that use neither the
internet nor flyers to advertise new products?

2. Sixty percent of employees at a department store are women. Government


research shows that on an average 12 percent of people cycle into work, a quarter
of them drive, 10 percent of people walk and the rest use public transport. What is
the probability that a randomly selected employee of the department store in
Newcastle commutes using public transport and is male?
Marginal, Joint and Conditional Probability
Marginal probability: The probability of an event occurring P(A), it may be thought of
as an unconditional probability. It is not conditioned on another event. Example: the
probability that a card drawn is red [P(red) = 0.5]. Another example: the probability
that a card drawn is a 4 [P(four)=1/13].
Joint probability: P(A and B) or P(A∩B). The probability of event A and event B
occurring. It is the probability of the intersection of two or more events. The probability
of the intersection of A and B may be written P(A∩B).
Example: The probability that a card is a four and red, P(four∩red) = 2/52=1/26. (There
are two red fours in a deck of 52, the 4 of hearts and the 4 of diamonds).
Conditional probability: P(A|B) is the probability of event A occurring, given that event
B occurs. Example: given that you drew a red card, what’s the probability that it’s a
four; P(four|red)=2/26=1/13. So out of the 26 red cards (given a red card), there are
two fours, hence the probability is 2/26 or 1/13.

P(A and B) P(A and B)


P(A | B) = P(B | A) =
P(B) P(A)
Example: 70% of households in New York city watch the news on an evening
and 50% of households watch the news on both the evening and in the morning.
What is the probability that a household, which watches the news in the evening,
will also watch the morning news?

Solution:
First we denote by E the event that the household watches the news on the
evening and M the event that the household watches the morning news.
From the question we have P(E)=0.7 and P(E and M)=0.5.
We have to calculate P(M|E).
We can use the equation of the multiplicative law as follows:
P(E and M) = P(E)×P(M|E)
Hence, P(M|E) = P(E and M)/P(E)
P(M|E) = 0.5/0.7 = 0.714
Ex.1: A study of speed violations and drivers who use cell phones produced the following data:

Speed No speed Total


violations (S) violations (NS)
Uses cell phones while 25 280 305
driving (C) (0.0333) (0.3733) (0.4067)
Does not use cell phones 45 400 445
while driving (NC) (0.06) (0.5333) (0.5933)
Total 70 680 750
(0.0933) (0.9067) (1)

Marginal probabilities:
Prob. that the driver uses a cell phone P(C) = 305/750 = 0.4067
Prob. that the does not use cell phone P(NC) = 445/750 = 0.5933
Prob. that the driver had speeding violation P(S) = 70/750 = 0.0933
Prob. That the driver did not have speeding violation P(NS) = 680/750 = 0.9067
Joint probabilities:
Prob. that the driver uses cell phones and had speed violation P(C∩S) = 25/750 = 0.0333
Prob. that the driver uses cell phones and had no speed violation P(C∩NS) = 280/750 = 0.3733
Prob. that the driver does not use cell phones and had speed violation P(NC∩S) = 45/750 = 0.06
Prob. that the driver does not use cell phones and had no speed violation P(NC∩NS) = 400/750 = 0.5333
Speed violations No speed violations Total Joint Probabilities
(S) (NS)
Uses cell phones while 25/750 = 0.0333 280/750 = 0.3733 305/750 = 0.4066
driving (C)
Does not use cell phones 45/750 = 0.06 400/750 = 0.5334 445/750 = 0.5934
while driving (NC)
Marginal Probabilities
Total 70/750 = 0.0933 680/750 = 0.9067 1

Conditional probabilities:
(i) Prob. that the driver uses a cell phone given that he had speed violation
P(C|S) = P(C∩S)/P(S) = 0.0333/0.0933 = 0.357
(ii) Prob. that the driver uses a cell phone given that he had did not have speed violation
P(C|NS) = P(C∩NS)/P(NS) = 0.3733/0.9067 = 0.4117
(iii) Prob. that the driver does not use a cell phone given that he had speed violation
P(NC|S) = P(NC∩S)/P(S) = 0.06/0.0933 = 0.6431
(iv) Prob. that the driver does not use a cell phone given that he did not have speed violation
P(NC|NS) = P(NC∩NS)/P(NS) = 0.5333/0.9067 = 0.5882
(v) Prob. that the driver had speed violation given that he uses cell phone
P(S|C) = P(C∩S)/P(C) = 0.0333/0.4067 = 0.0819
(vi) Prob. that the driver had speed violation given that he does not use cell phone
P(S|NC) = P(NC∩S)/P(NC) = 0.06/0.5933 = 0.1011
(vii) Prob. that the driver did not have speed violation given that he uses cell phone
P(NS|C) = P(C∩NS)/P(C) = 0.3733/0.4067 = 0.9179
(viii) Prob. that the driver did not have speed violation given that he does not use cell phone
P(NS|NC) = P(NC∩NS)/P(NC) = 0.5333/0.5933 = 0.8989
Ex.2: People use the cab services for personal and official purposes during weekdays and
weekends. The following data gives the usage pattern of 30,000 people:
Personal Official Total A. Joint and marginal probability table
purpose (C) purpose (D)
Personal Official Total
Weekdays 8000 14000 22000 purpose (C) purpose (D)
(A)
Weekdays (A) 0.2667 0.4667 0.7334
Weekends 6000 2000 8000
(B) Weekends (B) 0.2 0.0666 0.2666
Total 14000 16000 30000 Total 0.4667 0.5333 1

A. Use the information to prepare a joint and marginal probability table.


B. Calculate
i. Prob. that cabs are used during weekdays given that they are used for personal purpose. P(A|C) =
P(A∩C)/P(C) = 0.2667/0.4667 = 0.5721
ii. Prob. that cabs are used during weekdays given that they are used for official purpose. 0.4667/0.5333
iii. Prob. that cabs are used during weekends given that they are used for personal purpose. 0.2/0.4667
iv. Prob. that cabs are used during weekends given that they are used for official purpose. 0.0666/0.5333
v. Prob. that cabs are used for personal purpose given that they are used during weekdays. P(C|A) =
P(C∩A)/P(A) = 0.0267/0.7334 = 0.0364
vi. Prob. that cabs are used for official purpose given that they are used during weekdays. 0.4667/0.7334
vii. Prob. that cabs are used for personal purpose given that they are used during weekends. 0.2/0.2666
viii. Prob. that cabs are used for official purpose given that they are used during weekends. 0.0666/0.2666
Practice Exercises
1. If there are two events A & B such that P(A) = 0.5, P(B) = 0.6 and P(A∩B) = 0.4 then
(i) Find P(A|B) (ii) P(B|A) (iii) Are the events A & B independent? Why or why not?
2. A survey showed that during the past 12 months, 45.8% people used rented car for business
reasons, 54% people used for personal reasons and 30% people used for both.
(i) What is the probability that a person rented a car either for business or for personal
reasons?
(ii) What is the probability that a person did not rent a car for either business or personal
reasons?
3. The automobile industry sold 657,000 vehicles in the US. The table shows observed
frequencies of cars and light truck sold for the US and non-US companies:
Car Light truck
US company 87,400 193,100
Non-US company 228,500 148,000
Prepare a joint and marginal probability table. Also calculate all the possible conditional
probabilities.
Random Variables: Investment Returns

Consider the return per $1000 for two types of


investments.
Investment
Economic Condition
Prob. Passive Fund X Aggressive Fund Y

0.2 Recession - $25 - $200

0.5 Stable Economy + $50 + $60

0.3 Expanding Economy + $100 + $350


Investment Returns:
The Mean

E(X) = μX = (-25)(.2) +(50)(.5) + (100)(.3) = $50

E(Y) = μY = (-200)(.2) +(60)(.5) + (350)(.3) = $95

Interpretation: Fund X is averaging a $50.00 return


and fund Y is averaging a $95.00 return per $1000
invested.
Investment Returns:
Standard Deviation

σ X = (-25 − 50) 2 (.2) + (50 − 50) 2 (.5) + (100 − 50) 2 (.3)


= 43.30

σ Y = (-200 − 95) 2 (.2) + (60 − 95) 2 (.5) + (350 − 95) 2 (.3)


= 193.71

Interpretation: Even though fund Y has a higher


average return, it is subject to much more variability
and the probability of loss is higher.
Exercise: Discrete Random Variables
Problem: In a survey, the no. of mobile phones per household was collected as below:
No. of mobiles 0 1 2 3 4 5 Total
No. of households 5 15 30 30 15 5 100
(i) Develop a probability distribution for the no. of mobile phones per household.
(ii) Calculate the expected no. of mobile phones per household and its standard deviation.
(iii) Find the values of P(X≤2), P(X>2) and P(2≤X≤4), where X is the no. of mobiles.
X f(X) P(X) E(X) X-E(X) [X-E(X)]2 [X-E(X)]2.P(X) Variance = P(X≤2) P(X>2) P(2 ≤X≤4)
0 5 0.05 E(X) = ΣX.P(X) -2.5 6.25 0.3125 Σ[X-E(X)]2.P(X)
Sum of Sum of Sum of
= (0x0.05 + = 1.45 prob. for prob. for prob. for
1 15 0.15 -1.5 2.25 0.3375
1x0.15 + 2x0.3 + X = 0,1,2 X=3,4,5 X=2,3,4
2 30 0.30 3x0.3 + 4x0.15 + -0.5 0.25 0.075 Std. dev. = OR
sqrt(var.) 1-P(X≤2)
3 30 0.30 5x0.05) 0.5 0.25 0.075
= 2.5 = sqrt(1.45) P(X≤2)= P(X>2)= P(2 ≤X≤4)=
4 15 0.15 1.5 2.25 0.3375 = 1.204 0.05+0.1 0.30+0.1 0.30+0.30
5+0.30 = 5+0.05 = +0.15 =
5 5 0.05 2.5 6.25 0.3125
0.50 0.50 0.75
100 1.00 1.45
Practice Exercises
1. An online pharmacy claims to deliver medicines within 3 to 6 days of order. The manager collected
following data pertaining to the no. of days taken to deliver the orders:
No. of days 0 1 2 3 4 5 6 7 8
Probability 0 0 0.01 0.04 0.28 0.42 0.21 0.02 0.02

(i) What is the probability that delivery will be made within 3 to 6 days?
(ii) What is the probability that the delivery will be late?
(iii) What is the probability that the delivery will be early?
2. The following data shows the no. of hours a car being parked at a parking slot along with the
probabilities. The parking supervisor wants to know the expected no. of hours and standard deviation of
the no. of hours cars are parked in the slot.

No. of hours 1 2 3 4 5 6 7 8
Probability 0.24 0.18 0.13 0.10 0.07 0.04 0.04 0.20
Binomial Distribution
If there are only two possible outcomes of an experiment, say success and failure, then the
probability of ‘r’ successes in ‘n’ trials is given by:
P(X=r) = nCr pr (1-p)n-r ~ B(n, p)
where ‘p’ is the probability of success (which remains constant), r ≤ n and, r = 0, 1, 2, 3………n and
n = 1, 2, 3, 4, …………, ∞
✓ Sometimes it is also written as P(X=r) = nCr pr qn-r, where ‘q’ is the probability of failure and, p+q =
1.
✓ Mean and variance of a Binomial distribution are ‘np’ and ‘npq’ respectively.
Ex.1: Probability that a delivery boy delivers a correct order is 0.9. What is the probability that he
correctly delivers 8 out of 10 orders? What is the probability that he correctly delivers 8 or more
times?
Soln: Prob. of correct delivery (success) = 0.9; Prob. of wrong delivery (failure) = 1-0.9 = 0.1
n = 10, r = 8, p = 0.9, (1-p) = 1-0.9 = q
P(X=8) = 10C8 (0.9)8 (1-0.9)10-8 = 45.(0.4305).(0.01) = 0.1937
Hint: P(X≥8) = P(X=8) + P(X=9) + P(X=10); P(X=9) = 0.3874, P(X=10) = 0.3487; Answer: 0.9298
Ex.2: Nitrogen gas is filled in automobile tyres to improve the ride quality. A filling station experiences
that 30% of the customers get nitrogen gas filled in their vehicle’s tyres. If 10 customers arrive at a
gas station then what is the probability that,
(i) All of them would fill nitrogen gas?
(ii) None of them would fill nitrogen gas?
(iii) Half of them would fill nitrogen gas?
(iv) Less than 8 customers would fill nitrogen gas?
(v) Find the mean no. of customers who would fill nitrogen gas.
Soln: n = 10, p = 0.3, 1-p = 1-0.3 = 0.7
P(X=r) = nCr pr (1-p)n-r
(i) P(X=10) = 10C10 (0.3)10 (1-0.3)10-10 = 1.(0.000006).(1) = Almost zero
(ii) P(X=0) = 10C0 (0.3)0 (1-0.3)10-0 = 1.(1).(0.0282) = 0.0282
10!
(iii) P(X=5) = 10C5 (0.3)5 (1-0.3)10-5 = . (0.3)5(1-0.3)10-5 = 252.(0.0024).(0.1681) = 0.1029
5! 10−5 !
(iv) P(X<8) = 1-P(X≥8) = 1-[P(X=8) + P(X=9) + P(X=10)] = 1-(0.0015+0.0001+0.000006) = 0.9984
(v) Mean no. of customers who fill nitrogen gas = np = 10.(0.3) = 3
Ex.3: Fifty percent of the people believe that the country is in recession. For a sample of 20
people, make the following calculations:
(i) Probability that 12 people believe the country is in recession. (0.1201)
(ii) Probability that at least 18 people believe the country is in recession. (0.0002)
(iii) Probability that more than three people believe the country is in recession.
P(X>3)=1-P(X≤3) = 1-[P(X=0)+P(X=1)+P(X=2)+P(X=3)] = (0.9987)
(iv) Expected no. of people who would say that the country is in recession. (10)
(v) Compute variance and std. deviation of the no. of people who believe the country is in
recession. (5, 2.236)

Ex.4: Twenty three percent of the vehicles are not covered by any insurance. On a special
checking day, 30 vehicles are checked randomly. What is the probability that more than 27
vehicles are insured? (here, p=0.77, n=30)
What is the expected no. of vehicles not covered by any insurance? What is the variance and
std. deviation? (here, n=30, p=0.23, q=0.77)
Poisson Distribution
It is used to estimate the probability of no. of occurrences over a specified period of time. If the mean no. of
occurrences over a specified period of time is ‘λ’ then the probability of ‘r’ occurrences is given by,
−λ λ𝑟
P(X=r) = 𝑒 . ~ 𝑃(λ)
𝑟!
where r = 0, 1, 2, 3, …………∞ and e = 2.718.
✓ Both mean and variance of a Poisson distribution are ‘λ’.
✓ The value of ‘λ’ should be adjusted according to time-interval.
✓ Poisson distribution approximates the binomial distribution for large n and small p.
Ex.1: In a drive-thru window of a burger shop, average 10 customers arrive in a 30-minute interval. What is the
probability that exactly 5 customers arrive in 30 minutes?
What is the probability that 5 customers arrive in 15-minutes interval?
−λ λ𝑟
Soln: P(X=r) = 𝑒 . 𝑟!
For 30-minutes duration, λ = 10 and r = 5.
−10 105
P(X=5) = 𝑒 . = (0.000045).(100000/120) = (0.000045).(833.33) = 0.0378
5!
For 15-minutes duration, λ = 5 and r = 5.
−5 55
P(X=5) = 𝑒 . = (0.0067).(3125/120) = 0.1754
5!
Ex.2: An average of 15 aircraft accidents occur each year. Compute
(i) The mean no. of accidents per month.
(ii) Prob. of no accidents during a month.
(iii) Prob. of exactly one accident per month.
(iv) Prob. of more than one accidents per month.
− λ λ𝑟
Soln: P(X=r) = 𝑒 .
𝑟!
For 12-months duration λ = 15, for one-month duration λ = 1.25.
(i) Mean no. of accidents per month = 15/12 = 1.25
−1.25 (1.25)0
(ii) For r=0, P(X=0) = 𝑒 . = (0.2865).(1/1) = 0.2865
0!
−1.25 1.25 1
(iii) For r=1, P(X=1) = 𝑒 . = (0.2865).(1.25/1) = 0.3581
1!
(iv) For r>1, P(X>1) = 1- P(X≤1) = 1-[P(X=0)+P(X=1)] = 1-(0.2865+0.3581) = 0.3554

Ex.3: Phone calls arrive at the rate of 48 per hour in a call centre. Compute
(i) The mean no. of calls in 5 minutes duration. (4)
(ii) The prob. of receiving three calls in 5 minutes. (0.1952)
(iii) Prob. of receiving exactly 10 calls in 15 minutes. (0.1048)
(iv) Prob. of receiving at least one call in 10 minutes. (0.9997)
Ex.4: The safety department of Blue Pharma Company found that an average of three accidents occur per
month in the factory. Calculate the probability of (a) no accidents, (b) exactly one accident, (c) exactly two
accidents, (d) less than four accidents, (e) more than five accidents.
Soln: λ =3
− λ λ𝑟
P(X=r) = 𝑒 .
𝑟! 0
(3)
P(X=0) = 𝑒 −3 . = 0.0498, P(X=1) = 0.1494, P(X=2) = 0.2240
0!
P(X<4) = P(X=0)+P(X=1)+P(X=2)+P[X=3] = 0.0498+0.1494+0.2240+0.2240 = 0.6472
P(X>5) = 1-P[X≤5] = 1-(0.0498+0.1494+0.2240+0.2240+0.1680+0.1008) = 0.0839

Ex.5: Patients arrive at a hospital at the rate of 6 per hour. Find the probability that in a 90-minute duration
(i) exactly 7 patients arrive in the hospital.
(ii) between 7 and 10 patients arrive in the hospital. P(7≤X≤10) = P(X=7)+P(X=8)+P(X=9)+P(X=10)
(iii) If a patient arrives at 11:30am then what is the probability that other patients arrive before 11:45am?
λ= 1.5, P(X≥1)= 1-P(X<1) = 1-P(X=0) = 1-e-1.5 = 1- 0.2232 = 0.7768
Normal Distribution: Practice Exercises
1. In a city, it is estimated that the maximum temperature is normally distributed with a
mean of 23°C and a standard deviation of 5°C. Calculate the number of days in this
month in which it is expected to reach a maximum of between 21°C and 27°C.

2. The mean weight of 500 college students is 70 kg and the standard deviation is 3 kg.
Assuming that the weight is normally distributed, determine how many students weigh:
a. between 60 kg and 75 kg
b. more than 90 kg
c. less than 64 kg
d. exactly 64 kg
e. 64 kg or more
Practice Exercises
3. For borrowers with good credit scores, the mean debt amount is $15,000. Assuming
the debt amounts to be normally distributed with standard deviation $3000, calculate
the probability that
a. debt for a borrower is more than $18,000
b. debt for a borrower is less than $10,000
c. Debt for a borrower is between $12,000 and $18,000

4. At a gas station, the daily demand for regular gasoline is normally distributed with a
mean of 1000 gallons and a standard deviation of 100 gallons. The station manager has
just opened the station for business and finds that there is exactly 1100 gallons of
regular gasoline in storage. The manager wants to know the chances that the available
gasoline is enough to satisfy the day’s demand.
Sampling Distribution: Practice Exercises
1. Mean expenditure of all the visitors in a restaurant is Rs.2000 with a std. deviation of
Rs.250. A random sample of 40 customers was taken, find the probability that
(a) mean expenditure of customers is more than Rs.1928, (b) mean expenditure of
customers is between Rs.1950 and Rs.2030.
(a) Z = Xσ −μ = 1928
250
−2000
= -1.82
ൗ n ൗ 40

𝐏(𝐗 > 𝟏𝟗𝟐𝟖)


= 𝐏 Z > −1.8𝟐
= 𝑷(−1.82<Z<0) + 𝑷(0<Z<∞)
= 0.4656 + 0.5 0.4656 0.5
= 0.9656

(b) P(1950< 𝐗<2030) Z= -1.82 Z=0


= P(-1.26<Z<0.76) = 0.6726
2. The numerical population of grade point averages at a college has mean 2.61 and
standard deviation 0.5. If a random sample of size 100 is taken from the population,
what is the probability that the sample mean will be between 2.51 and 2.71?
P(2.51< X<2.71) = P(-2<Z<2) = 0.4772+0.4772 = 0.9544
3. A prototype automotive tire has a mean design life of 38,500 miles with a standard
deviation of 2,500 miles. Five such tires are manufactured and tested. Find the
probability that the sample mean will be less than 36,000 miles. Assume that the
distribution of lifetimes of such tires is normal. P(X<36000) = P(Z<-2.24)
4. An automobile battery manufacturer claims that its midgrade battery has a mean life
of 50 months with a standard deviation of 6 months. Suppose the distribution of battery
lives of this particular brand is approximately normal.
(a) On the assumption that the manufacturer’s claims are true, find the probability that
a randomly selected battery of this type will last less than 48 months. (Normal
distribution problem) P(X<48) = P(Z<-0.33)
(b) On the same assumption, find the probability that the mean life of a random sample
of 36 such batteries will be less than 48 months. (Sampling distribution problem)
P(X<48) = P(Z<-2)
Example:1
• A sample of 11 circuits from a large normal population has a
mean resistance of 2.20 ohms. We know from past testing that
the population standard deviation is 0.35 ohms. Determine a
95% confidence interval for the true mean resistance of the
population.
σ
• Solution: Confidence Interval: X  Zα/2 n
= 2.20  1.96 (0.35/ 11)
= 2.20  0.2068
1.9932  μ  2.4068
Interpretation

• We are 95% confident that the true mean


resistance is between 1.9932 and 2.4068
ohms
• Although the true mean may or may not be in
this interval, 95% of intervals formed in this
manner will contain the true mean
Example:2
• The gross salary of all the employees of ABC Technologies has a std. deviation of Rs.5200. To study the
food habits of employees, a random sample of 40 employees was taken and the mean salary was found to
be Rs.41,500. Calculate the interval estimate of gross salaries of all the employees of ABC technologies
using (a) 95% confidence level, and (b) 99% confidence level.

• Solution: Confidence interval estimate is given by


= 41500, σ = 5200, n = 40 σ
X  Z α/2
(a) Zα/2 = 1.96 (for conf. level of 95%) n
Hence, interval estimate of the gross salaries is:
- Zα/2(σ/√n) ≤ μ ≤ + Zα/2(σ/√n)
41500 – 1.96(5200/√40) ≤ μ ≤ 41500 + 1.96(5200/√40)
41500 – (1611.50) ≤ μ ≤ 41500 + (1611.50)
39888.50 ≤ μ ≤ 43111.50
The gross salary of all the employees of ABC Technologies is between Rs. 39,888.50 and Rs. 43,111.50.
(b) Zα/2 = 2.58 (for conf. level of 99%)
41500 – 2.58(5200/√40) ≤ μ ≤ 41500 + 2.58(5200/√40)
41500 – (2121.26) ≤ μ ≤ 41500 + (2121.26)
39378.74 ≤ μ ≤ 43621.26. The confidence interval estimate is (39378.74, 43621.26) at 99% level.
Required Sample Size Example

Ex.1: If  = 45, what sample size is needed to


estimate the mean within ± 5 margin of error
with 90% confidence?

Z σ 2 2
(1.645) (45) 2 2
n= 2
= 2
= 219.19
e 5

So the required sample size is n = 220


(Always round up)
If σ is unknown
• If unknown, σ can be estimated when using the required sample
size formula
• Use a value for σ that is expected to be at least as large as the true σ
• Select a pilot sample and estimate σ with the sample standard
deviation, S

Ex.2: The expenditure of customers in a restaurant is studied by a researcher and


he comes to know that the std. dev. of expenditure is Rs.30. If the researcher is
willing to have a margin of error of 3.2, what should be the sample size at 99%
confidence level? 2 2
Zα / 2 σ
Ans: 586 (approx.) n=
e2
Hypothesis Formulation
1. A generic brand of antibiotic markets a capsule with a 500 milligram dose. The
manufacturer is worried that the machine that fills the capsules has come out of
calibration and is no longer creating capsules with the appropriate dosage. Does it
mean that the population mean dosage of this brand is different than 500 mg?
•Null Hypothesis: On the average, the dosage sold under this brand is 500 mg.(H0: μ = 500)
•Alternative Hypothesis: On the average, the dosage sold under this brand is not 50 mg
(population mean dosage ≠ 500 mg). (Ha: μ ≠ 500)

2. A production line operation is designed to fill cartons with laundry detergent to a mean
weight of 32 ounces. A sample of cartons is periodically selected and weighed to
determine whether underfilling or overfilling is occurring. If the sample data lead to a
conclusion of underfilling or overfilling, the production line will be shut down and adjusted
to obtain proper filling. Formulate the null and alternative hypotheses that will help in
deciding whether to shut down and adjust the production line.
Solution: H0: μ = 32 & Ha: μ ≠ 32
One sample Z-test: Example-1

Test the claim that the mean number of TV sets in US


households is equal to 3. Assume σ = 0.8

Step-1: State the appropriate null and alternative


hypotheses
◼ H0: μ = 3 Ha: μ ≠ 3 (This is a two-tail test)
Step-2: Specify the desired level of significance and the sample size
◼ Suppose that  = 0.05 and n = 100 are chosen for this test
Z-test: Example-1

Step-3: Determine the appropriate technique


◼ σ is assumed known so this is a Z test.

Step-4: Determine the critical values


◼ For  = 0.05 the critical Z values are ±1.96

Step-5: Collect the data and compute the test statistic


◼ Suppose the sample results are

n = 100, = 2.84 (σ = 0.8 is assumed known)


So the test statistic is:
X − μ 2.84 − 3 − .16
ZSTAT = = = = −2.0
σ 0.8 .08
n 100
Z-test: Example-1

Step-6 (continued): Reach a decision and interpret the result

 = 0.05/2  = 0.05/2

Reject H0 Do not reject H0 Reject H0

-Zα/2 = -1.96 0 +Zα/2= +1.96


-2.0
Since ZSTAT = -2.0 < -1.96, reject the null hypothesis and
conclude there is sufficient evidence that the mean number of
TVs in US homes is not equal to 3.
One sample Z-test: Example-2
Average returns in equity mutual funds of company ‘X’ is supposed to be 12% p.a. with a std.
deviation of 2%. A sample of 40 investors is randomly selected and their mean return is found to
be 11.3%. Test at 5% level of significance whether there is any significant difference between the
sample mean and population mean? What if the level of significance is 1%?

X−μ
ZSTAT =
σ
n

H0: μ = 12 and Ha: μ ≠ 12 (This is a two-tail test)


Here  = 0.05, = 11.3, μ = 12, σ = 2, n = 40
ZSTAT = (11.3 – 12)/(2/√40) = -2.21; ZCRT = ±1.96 (at 5% l.s.)
lZSTATl < lZCRTl then accept H0 and lZSTATl > lZCRTl then reject H0
lZSTATl = l-2.21l = 2.21, which is more than 1.96(ZCRT), hence we reject H0
Check whether the test statistic lies in the rejection region?

/2 = 0.025 /2 = 0.025

Reject H0 if Reject H0 Do not reject H0 Reject H0


ZSTAT < -1.96 or -Zα/2 = -1.96 Z=0 +Zα/2 = +1.96
ZSTAT > 1.96;
otherwise do
not reject H0 Here, ZSTAT = -2.21 < -1.96, so the test statistic is in the rejection
region, hence we reject the null hypothesis and conclude that
there is a significant difference between sample mean and
population mean.
Example-1: Two sample Z-test
A survey was conducted in two large cities to study the distance travelled by people per day:
City 1 City 2
Sample size 50 40
Mean 22.5 miles 18.6 miles
Popln. Std. 8.4 miles 7.4 miles
deviation
Test at 5% level of significance whether the mean distance travelled in both the cities differ significantly?
Solution: The hypothesis is
H0: μ1−μ2=0 and Ha: μ1−μ2≠0
ZSTAT = 2.34 (calculated using formula); ZCRT = 1.96 (known at 5% l.s.)
Rule for acceptance/rejection:
lZSTATl < lZCRTl then accept H0 and lZSTATl > lZCRTl then reject H0
Here, lZSTATl > lZCRTl, hence we reject the null hypothesis and conclude that the mean distance travelled in
both the cities differ significantly.
Example-2: Two sample Z-test
Greystone Departmental Store has stores in two suburban areas. Random samples are
taken from both the areas to study the mean age of customers. From the first area, a
random sample of 36 customers was taken whose mean age was 40 yrs and a random
sample of 49 customers was taken from the second area whose mean age was 35 yrs.
The population std. deviation of age in both the areas are 9 yrs and 10 yrs respectively.
The manager wants to know whether the mean age of customers in both areas differ
significantly?
Solution:
Assume the level of significance as 5%, hence ZCRT = 1.96
H0: μ1−μ2=0 and H1: μ1−μ2≠0
ZSTAT = 2.41
lZSTATl > lZCRTl hence reject H0
The mean age of customers in the two suburban areas differ significantly.
Z-Test: Practice Problems
1. Twenty five high school students complete a preparation program for taking the SAT test. The mean
of these scores is 536. From past records it is known that the population average for SAT scores is
500 with a standard deviation of 100. Are these students’ SAT scores (Mean = 536) significantly
different than a population mean of 500? Test the hypothesis at 0.10 and 0.05 levels.

2. Dr Zeppo grades his introductory statistics class on a curve. Let’s suppose that the average grade in
his class is 67.5, and the standard deviation is 9.5. Of his many hundreds of students, it turns out
that 30 of them also take psychology classes. Do the psychology students tend to get the same
grades as everyone else?

3. In order to determine whether two varieties of pomegranates significantly differ in their mean
weights, 70 random samples were drawn from each variety and their mean weights measured. The
mean weight of variety-A was measured to be 220.6 gram with standard deviation of 24.6 grams,
while the mean weight of variety-B was 232.8 gram with a standard deviation of 27.8 gram. Test
whether the mean weights are different to a significance level of 0.05.

You might also like