Introduction to Statistical Concepts
Introduction to Statistical Concepts
Page 1 of 92
1. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.
2. Organization of data: Summarization of data in some meaningful way, e.g table form.
3. Presentation of the data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which
conclusions are formed and inferences made.
Page 2 of 92
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena.
The following are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics
6. Testing and formulating of hypothesis
7. Studying the relationship between two or more variable
8. Forecasting future events.
Limitations of statistics
As a science statistics has its own limitations.
The following are some of the limitations:
Deals with only quantitative information.
Deals with only aggregate of facts and not with individual data items.
Statistical data are only approximately and not mathematical correct.
Statistics can be easily misused and therefore should be used by experts.
Page 3 of 92
Measurement scales
Measurement is the assignment of numbers to objects or events in a systematic fashion.
Measurement scale refers to the property of value assigned to the data based on the properties of
order, distance and fixed zero. Four levels of measurement scales are commonly distinguished:
nominal, ordinal, interval, and ratio and each possessed different properties of measurement
systems.
1. Nominal Scales
Nominal scales are measurement systems that possess none of the three properties stated above.
Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
No arithmetic and relational operation can be applied.
Examples:
o Sex (Male or Female)
o Marital status(married, single, widow, divorce)
o Country code
o Regional differentiation of Ethiopia
o Motor type
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the property of order, but not the property
of distance. The property of fixed zero is not important if the property of distance is not satisfied.
Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
Arithmetic operations are not applicable but relational operations are applicable.
Examples:
o Letter grades (A, B, C, D, F).
o Rating scales (Excellent, Very good, Good, Fair, poor).
o Military status.
3. Interval Scales
Interval scales are measurement systems that possess the properties of Order and distance, but
not the property of fixed zero.
Page 4 of 92
Level of measurement which classifies data that can be ranked and differences are meaningful.
However, there is no meaningful zero, so ratios are meaningless.
All arithmetic operations except division are applicable.
Relational operations are also possible.
Examples: IQ, Temperature in oF
4. Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted;
i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas this is not possible with
interval scales.
Level of measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. True ratios exist between the different units of measure.
All arithmetic and relational operations are applicable.
Examples: Weight, Height, distance, Age
Page 5 of 92
b) Telephone interviews
c) Mailed questionnaires ( Self-administered questionnaires returned by mail )
III. The use of documentary sources
It is extracting of information from existing sources (e.g. Hospital, municipality, …)
1.2.2 Source and types of data
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.
Primary data: are data originally collected for the immediate purpose. Primary data are more
expensive than secondary data.
Data measured or collect by the investigator or the user directly from the source.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others. Usually they are published or unpublished materials, records,
reports, documents and etc.
Page 6 of 92
1.2.4 Frequency distributions
Frequency distribution: is the organization of raw data in table form using classes and
frequencies.
Frequency: is the number of values in a specific class of the distribution.
There are three basic types of frequency distributions
Categorical frequency distribution
Ungrouped frequency distribution
Grouped frequency distribution
1. Categorical frequency Distribution:
Used for data that can be place in specific categories such as nominal, or ordinal. e.g. marital status.
Example: a social worker collected the following data on marital status for 25 persons.(M=married,
S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M,
S, D, and W. These types will be used as class for the distribution. We follow procedure to construct
the frequency distribution.
Step 1: Make a table as shown.
Class (1) Tally (2) Frequency (3) Percent (4)
M
S
D
W
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
f
Step 4: Find the percentages of values in each class by using; % * 100 Where f= frequency of
n
the class, n=total number of value.
Page 7 of 92
Step 5: Find the total for column (3) and (4).
Combining the entire steps one can construct the following frequency distribution.
Class (1) Tally (2) Frequency (3) Percent (4)
M //// 5 20
S //// // 7 28
D //// // 7 28
W //// 6 24
Page 8 of 92
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3. Grouped frequency Distribution:
-When the range of the data is large, the data must be grouped in to classes that are more than one
unit in width.
Definitions:
Grouped Frequency Distribution: a frequency distribution when several numbers are
grouped in one class.
Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one class
and lower limit of the next.
Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
Class boundaries: Separates one class in a grouped frequency distribution from another. The
boundaries have one more decimal places than the row data and therefore do not appear in
the data. There is no gap between the upper boundary of one class and lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to the corresponding
upper class limit.
Class width: the difference between the upper and lower class boundaries of any class. It is
also the difference between the lower limits of any two consecutive classes or the difference
between any two consecutive class marks.
Class mark (Mid points): it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
Page 9 of 92
Cumulative frequency above: it is the total frequency of all values greater than or equal to
the lower class boundary of a given class.
Cumulative frequency blow: it is the total frequency of all values less than or equal to the
upper class boundary of a given class.
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
Relative frequency (rf): it is the frequency divided by the total frequency.
Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Steps for constructing Grouped frequency Distribution
Page 10 of 92
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may
not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example:- Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes’ desired using Sturges formula;
k 1 3.32 log n =1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries; E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
Then continue adding w on both boundaries to obtain the rest boundaries. By doing so
one can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
Page 11 of 92
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
Class Class boundary Class Mark Tally Freq. Cf (less than type) Cf (more than rf. rcf (less than
limit type) type
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00
Page 12 of 92
Men Women Girls Boys
2500 2000 4000 1500
Solutions: Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name corresponding percentage.
Class Frequency Percent Degree
Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
CLASS
Boy s Men
Girls Women
Page 13 of 92
-Are used to display data on one variable.
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity is
represented by the height /length of the bars.
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
Product Sales($) Sales($) Sales($)
In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Solutions:
Sales by product in 1957
30
25
Sales in $
20
15
10
5
0
A B C
product
Component Bar-graphs
-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use
component bar-graph.
-The bars represent total value of a variable with each total broken in to its component parts and different
colours or designs are used for identifications
Example: Draw a component bar graph to represent the sales by product from 1957 to 1959.
Solutions:
100
80
Sales in $
Product C
60
Product B
40
Product A
20
0
1957 1958 1959
Year of production
Page 14 of 92
Multiple Bar-graphs
- It is used to display data on more than one variable for comparing different variables at the same time.
Example: Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solutions:
60
50
Sales in $
40 Product A
30 Product B
20 Product C
10
0
1957 1958 1959
Year of production
1.2.6 Graphical presentation of data: Histogram, Frequency polygon, and Ogive curve
- The histogram, frequency polygon and cumulative frequency graph or ogive are most commonly
applied graphical representation for continuous data.
Procedures for constructing statistical graphs:
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes.
Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axes.
Plot the points.
Draw the bars or lines to connect the points.
Histogram
A graph which displays the data by using vertical bars of various height to represent frequencies. Class
boundaries are placed along the horizontal axes. Class marks and class limits are sometimes used as
quantity on the X axes.
Example: Construct a histogram to represent the previous data (example *).
Frequency Polygon: It is a line graph. The frequency of the data is placed along the vertical axis and
classes mid points are placed along the horizontal axis. It is customer to the next higher and lower class
interval with corresponding frequency of zero; this makes it a complete polygon.
Page 15 of 92
Example: Draw a frequency polygon for the above data (example *).
Solutions:
4
Value Frequency
0
2. 5 8. 5 14.5 20.5 26.5 32.5 38.5 44.5
Page 16 of 92
Chapter two
2 Summarizing of Data
2.1 Measures of central Tendency: objectives of measuring central tendency
Introduction: When we want to make comparison between groups of numbers it is good to have a single
value that is considered to be a good representative of each group. This single value is called
the average of the group. Averages are also called measures of central tendency. An average
which is representative is called typical average and an average which is not representative and
has only a theoretical value is called a descriptive average. A typical average should possess
the following:
It should be rigidly defined.
It should be based on all observation under investigation.
It should be as little as affected by extreme observations.
It should be capable of further algebraic treatment.
It should be as little as affected by fluctuations of sampling.
It should be ease to calculate and simple to understand.
Objectives:
To comprehend (understand) the data easily.
To facilitate comparison.
To make further statistical analysis.
The Summation Notation:
Let X1, X2 ,X3 …XN be a number of measurements where N is the total number of
th
observation and Xi is i observation.
Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a
formula to compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of scores, called
the summation notation.
N
The symbol X
i 1
i is a mathematical shorthand for X1+X2+X3+...+XN
Page 17 of 92
Properties of Summation
n
1. k nk
i 1
where k is any constant
n n
2. kX i k X i where k is any constant
i 1 i 1
n n
3. (a bX
i 1
i ) na b X i
i 1
where a and b are any constant
n n n
4. (X
i 1
i Yi ) X i Yi
i 1 i 1
fX k
f n
i i
Then the mean will be X i 1 , where k is the number of classes and i
k
f
i 1
i
i 1
Page 18 of 92
4
f i Xi
36
X i 1
4
5.15
f
7
i
i 1
Arithmetic Mean for Grouped Data: If data are given in the shape of a continuous frequency
k
f i Xi th
distribution, then the mean is obtained as: X i 1
, Where Xi =the class mark of the i
k
f i 1
i
f X i i
1575
X i 1
6
15.75
f
100
i
i 1
Page 19 of 92
Exercises: Life times (in months) of 75 bulbs are summarized in the following frequency
distribution:
Marks No. of students
40-44 7
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If the 20% of the bulbs have life times between 55 and 59 months,
i. Find the missing frequencies f4 and f5.
ii. Find the mean.
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
n
( X X ) 0.
i 1
i
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
n n
( Xi X ) ( X A) , A X
i 1
2
i 1
i
2
X n X 2 n 2 .... X k n k X n i i
by: Xc 1 1 i 1
n1 n 2 ...n k
k
n
i 1
i
Page 20 of 92
Solutions:
Males
Females
X 2 72
X 1 60
n2 70
n1 30
X 1 n1 X 2 n 2 X n i i
Xc i 1
n1 n 2
2
n
i 1
i
mean
Weighted Mean
X W i i
Xw i 1
n
W
i 1
i
Example: A student obtained the following percentage in an examination: English 60, Biology
75, Mathematics 63, Physics 59, and chemistry 55.Find the students weighted arithmetic mean if
weights 1, 2, 1, 3, 3 respectively are allotted to the subjects.
Solutions:
Page 21 of 92
5
X W i i
60 * 1 75 * 2 63 * 1 59 * 3 55 * 3 615
Xw i 1
61.5
1 2 1 3 3
5
10
W
i 1
i
The geometric mean of a set of n observation is the nth root of their product.
The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:
G.M n X1 * X2 * ... * Xn
Taking the logarithms of both sides
1
log(G.M) log(n X 1 * X 2 * ... * X n ) log(X 1 * X 2 * ... * X n ) n
1 1
log(G.M) log(X 1 * X 2 * .... * X n ) (log X 1 log X 2 ... log X n )
n n
n
1
log(G.M) log X i
n i1
Page 22 of 92
The logarithm of the G.M of a set of observation is the arithmetic mean of their logarithm.
1 n
G.M Anti log( log X i )
n i1
If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their harmonic
mean is given by
n
W i
H.M n
i 1
, This is called Weighted Harmonic Mean.
W
i 1
i Xi
Remark: The Harmonic Mean is useful and appropriate in finding average speeds and average
rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back from the
college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H.M 12km/hr
1 1
10 15
Page 23 of 92
2.2.2 The mode
Mode is a value which occurs most frequently in a set of values
The mode may not exist and even if it does exist, it may not be unique.
In case of discrete distribution the value having the maximum frequency is the model value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9. Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
- The mode of a set of numbers X1, X2, … Xn is usually denoted by X̂ .
1 Where:
X̂ L mo w
1 2 Xˆ the mod e of the distribution
w the size of the mod al class
1 f mo f 1
2 f mo f 2
f mo frequencyof the mod al class
f 1 frequencyof the class preceedingthe mod al class
f 2 frequencyof the class following the mod al class
Page 24 of 92
Solutions:
45 55 is the mod al class,sin ce it is a class with thehighestfrequency.
L mo 45
w 10
1 f mo f 1 2
2 f mo f 2 26
f mo 31
f 1 29
f2 5
ˆ 45 10 2
X
2 26
45.71
1
( X [3] X [ 4 ] )
2
1
( 5 6) 5.5
2
b) Order the data:1, 2, 3, 5, 8. Here n=5
~ X
X n 1
[ ]
2
X [3]
3
Page 26 of 92
Remark: The median class is the class with the smallest cumulative frequency (less than type) greater
n
than or equal to .
2
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
First find the less than cumulative frequency.
Identify the median class.
Find median using formula.
Class Frequency Cumu.Freq(less
than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75
n 75
37.5
2 2
39 is the first cumulative frequencyto be greater thanor equalto 37.5
50 54 is the median class.
L 49.5, w 5
med
n 75, c 17, f 22
med
Page 27 of 92
~
X L w ( n c)
med f 2
med
49.5 5 (37.5 17)
22
54.16
Page 28 of 92
Remark: The quartile class (class containing Qi) is the class with the smallest cumulative frequency
Remark: The decile class (class containing Di )is the class with the smallest cumulative frequency (less
Page 29 of 92
Remark: The percentile class (class containing Pi )is the class with the smallest cumulative
frequency (less than type) greater than or equal to iN .
100
Page 30 of 92
a) Quartiles:
i. Q1
- determine the class containing the first quartile.
N
123.25
4
170 180 is the class containingthe first quartile.
w N
Q1 LQ1 ( c)
LQ 170 ,
1
w 10 fQ 4
1
N 493 , c 88 , f Q 72
1 170
10
(123.25 88)
72
174.90
ii. Q2
- determine the class containing the second quartile.
2* N
246.5
4
190 200 is the class containingthe sec ond quartile.
LQ 190 ,
2
w 10
N 493 , c 244 , f Q 107
2
w 2* N
Q2 LQ ( c)
2
fQ
2
4
10
170 ( 246.5 244)
72
190.23
iii. Q3
- determine the class containing the third quartile.
3* N
369.75
4
200 210 is the class containingthe third quartile.
Page 31 of 92
LQ 200 ,
3
w 10
N 493 , c 351 , f Q 49
3
w 3* N
Q3 LQ 3 ( c)
fQ 4
3
10
200 (369.75 351)
49
203.83
b) D7
- determine the class containing the 7th decile.
7* N
345.1
10
190 200 is the class containingthe seventh decile.
LD 190 ,
7
w 10
N 493 , c 244 , f D 107
7
w 7* N
D7 LD ( c)
7
f D 10
7
10
190 (345.1 244)
107
199.45
c) P90
- determine the class containing the 90th percentile.
90 * N
443.7
100
220 230 is the class containingthe 90th percentile.
LP 220 ,
90
w 10
N 493 , c 434 , f P 3107
90
Page 32 of 92
w 90 * N
P90 LP ( c)
90
f P 100
90
10
220 (443.7 434)
31
223.13
Page 33 of 92
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Standard deviation and coefficient of variation.
2.4.1 Range, Variance, Standard deviation and coefficient of variation
The Range (R): is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the range
of scores. Because the range is greatly affected by extreme scores, it may give a distorted picture
of the scores. The following two distributions have the same range, 13, yet appear to differ
greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
Page 34 of 92
It is not liable to further algebraic treatment.
It cannot be computed in the case of open end distribution.
It is very sensitive to the size of the sample.
The Variance
Population Variance: If we divide the variation by the number of values in the population, we
get something called the population variance. This variance is the "average squared deviation
from the mean".
1
Population Varince 2 ( X i ) 2 , i 1,2,.....N
N
1
Population Varince 2 f i ( X i ) 2 , i 1,2,.....k
N
Sample Variance: One would expect the sample variance to simply be the population variance
with the population mean replaced by the sample mean. However, one of the major uses of
statistics is to estimate the corresponding parameter. This formula has the problem that the
estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the
deviations is divided by one less than the sample size.
1
Sample Varince S 2 ( X i X ) 2 , i 1,2,....., n
n 1
1
Sample Varince S 2 f i ( X i X ) 2 , i 1,2,.....k
n 1
n
X i nX 2
2
S2 i 1
, for raw data.
n 1
Page 35 of 92
k
f i X i nX 2
2
S2 i 1
, for frequency distribution.
n 1
Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
Page 36 of 92
Solutions:
1. X 11
Xi 5 10 12 17 Total
(Xi- X)2 36 1 1 36 74
n
( X i X )2 74
S 2 i 1 24.67.
n 1 3
S S 2 24.67 4.97.
2. X 55
Xi(C.M) 42 47 52 57 62 67 72 Total
1183 640 198 60 588 864 867 4400
fi(Xi- X)2
n
fi ( X i X )2 4400
S2 i 1
59.46.
n 1 74
S S2 59.46 7.71.
Special properties of Standard deviations
1. ( X i X )2 ( X i A) 2 ,A X
n 1 n 1
2. For normal (symmetric distribution the following holds.
Approximately 68.27% of the data values fall within one standard deviation of the mean. i.e.
with in ( X S , X S )
Approximately 95.45% of the data values fall within two standard deviations of the mean. i.e.
with in ( X 2 S , X 2 S )
Approximately 99.73% of the data values fall within three standard deviations of the mean.
i.e. with in ( X 3S , X 3S )
3. Chebyshev's Theorem
Page 37 of 92
For any data set ,no matter what the pattern of variation, the proportion of the values that fall
Applying the above theorem at least (1 1 ) *100% 75% of the numbers lie between 38 and 62.
k2
b) Similarly done.
c) It is just the complement of a) i.e. at most 1 *100% 25% of the numbers lie less
2
k
than 32 or more than 62.
d) Similarly done.
Example 2: The average score of a special test of knowledge of wood refinishing has a mean of
53 and standard deviation of 6. Find the range of values in which at least 75% the scores will lie.
(Exercise)
4. If the standard deviation of X 1 , X 2 , .....X n is S , then the standard deviation of
Page 38 of 92
Exercise: Verify each of the above relationship, considering k and a as constants.
Examples:
1. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what will be the
variance and standard deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will be
the variance and standard deviation of the new set?
Solutions:
1. a. They will remain the same.
SB 11
C.VB *100 *100 23.16%
XB 47.5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
2. A meteorologist interested in the consistency of temperatures in three cities during a given week
collected the following data. The temperatures for the five days of the week in the three cities were
Page 39 of 92
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Exercise: Which city have the most consistent temperature, based on these data?
2.4.2 Standard Scores (Z-scores)
Page 40 of 92
Value Group one Group two
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B from Group two
take 9.3 minutes, who was faster in performing the task? Why?
Solutions:
a) Use coefficient of variation.
S1 1.2
C.V1 *100 *100 11.54%
X1 10.4
S2 1.3
C.V2 *100 *100 10.92%
X2 11.9
Since C.V2 < C.V1, group 2 is more consistent.
b) Calculate the standard score of A and B
X A X 1 9.2 10.4
ZA 1
S1 1.2
XB X2 9.3 11.9
ZB 2
S2 1.3
Child B is faster because the time taken by child B is two standard deviation shorter than the
average time taken by group 2 while, the time taken by child A is only one standard deviation
shorter than the average time taken by group 1.
Page 41 of 92
Chapter Three
3 Elementary Probability
3.1 Deterministic and non-deterministic models
A model is an approximate representation of a physical situation. It attempts to explain observed
behavior using a set of simple and understandable rules. These rules can be used to predict the
outcome of experiments involving the given physical situation. A useful model explains all
relevant aspects of a given situation. Such models can be used instead of experiments to answer
questions regarding the given situation. Models therefore allow the engineer to avoid the costs of
experimentation, namely, labor, equipment, and time.
The two basic types of models are deterministic models and non-deterministic models
(probability models)
Deterministic models used when the observational phenomenon has measurable properties. In
this type of model conditions under which an experiment is carried out determine the exact
outcome of the experiment. The solution of a set of mathematical equations specifies the exact
outcome of the experiment. As a simple example, suppose we are measuring the flow of current
in a thin copper wire, our model for this phenomena might be Ohm’s low that:
Current =Voltage/Resistance
Or
We call this deterministic model because it assumes that the interaction between these idealized
components is completely described by this formula or laws.
If an experiment involving the measurement of a set of voltages is repeated a number of times
under the same conditions, this theory predicts that the observations will always be exactly the
same.
However, if we performed this measurement process more than once, perhaps at different times,
or even on different days, the observed current could differ slightly because of small changes or
variations in factors that are not completely controlled, such as changes in ambient temperature,
Page 42 of 92
fluctuations in performance of the gauge, small impurities present at different locations in the
wire and drifts in the voltage source. Consequently, a more realistic model of the observed
current might be
Where is a term added to the model to account for the fact that the observed values of current
flow do not perfectly conform to the deterministic model .We can think of as a term that
includes the effects of all of the unmodeled sources of variability that affect this system.
3.2 Review of set theory: sets, union, intersection, complementation, De Morgan’s rules
The term set refers to a well-defined collection of objects that share a certain property or certain
properties. The term “well-defined” here means that the set is described in such a way that one
can decide whether or not a given object belongs in the set. If is a set, then the objects of the
collection are called the elements or members of the set . If is an element of the set , we
write . If is not an element of the set , we write .
As a convention, we use capital letters to denote the names of sets and lowercase letters for
elements of a set.
Note that for each objects and each set , exactly one of or but not both must be
true.
Example a.The set of counting numbers less than ten.
b.The set of letters in the word “Addis Ababa.”
c. The set of all countries in Africa.
Union
The union of two events is the event that consists of all outcomes that are contained in either of
the two events. We denote the union as:
Example: Consider an experiment in which you select a molded plastic part, such as a
connector, and measure its thickness. The possible values for thickness depend on the resolution
of the measuring instrument, and they also depend on upper and lower bounds for thickness. If
the objective of the analysis is to consider only whether or not the parts conform to the
manufacturing specifications, either part may or may not conform. We abbreviate yes and no as y
and n. If the ordered pair yn indicates that the first connector conforms and the second does not,
the set of all outcomes can be represented by the four outcomes:
Page 43 of 92
Suppose that the set of all outcomes for which at least one part conforms is denoted as E1. Then,
. The event, in which both parts do not conform, denoted as , contains only the
single outcome, * + Other examples of events are * + , the null set, and . If
* +
Intersection
The intersection of two events is the event that consists of all outcomes that are contained in both
of the two events. We denote the intersection as: .
* +.
Complementation
The complement of an event in a sample space is the set of outcomes in the sample space that are
not in the event. We denote the complementation of the event .
* +
De Morgan’s rules
De Morgan’s laws imply that
Example: ( ) * + * +
Page 44 of 92
However, no matter how carefully our experiment is designed and conducted, the variation is
almost always present, and its magnitude can be large enough that the important conclusions
from our experiment are not obvious.
The set of all possible outcomes of a random experiment is called the sample space of the
experiment. The sample space is denoted as S.
Example: Consider an experiment in which you select a molded plastic part, such as a
connector, and measure its thickness. The possible values for thickness depend on the resolution
of the measuring instrument, and they also depend on upper and lower bounds for thickness.
However, it might be convenient to define the sample space as simply the positive real line
because a negative value for thickness cannot occur. If it is known that all connectors will be
between 10 and 11 millimeters thick, the sample space could be
If the objective of the analysis is to consider only whether a particular part is low, medium, or
high for thickness, the sample space might be taken to be the set of three outcomes:
If the objective of the analysis is to consider only whether or not a particular part conforms to the
manufacturing specifications, the sample space might be simplified to the set of two outcomes
If the objective of the analysis is to consider only whether or not the parts conform to the
manufacturing specifications, either part may or may not conform. We abbreviate yes and no as y
and n. If the ordered pair yn indicates that the first connector conforms and the second does not,
the sample space can be represented by the four outcomes:
In random experiments in which items are selected from a batch, we will indicate whether or not
a selected item is replaced before the next one is selected. For example, if the batch consists of
Page 45 of 92
three items {a, b, c} and our experiment is to select two items without replacement, the sample
If items are replaced before the next one is selected, the sampling is referred to as with
Example: In the experiment of tossing a die one time the events 1, 2, 3, 4, 5, 6 have the
same chance of occurring, so they are equally likely events.
3.5 Counting techniques
As sample spaces become larger, complete enumeration is difficult. Instead, counts of the
number outcomes in the sample space and in various events are often used to analyze the random
experiment. These methods are referred to as counting techniques.
In order to determine the number of outcomes, one can use several rules of counting.
The multiplication rule, Permutation rule, Combination rule
Page 46 of 92
The Multiplication Rule: If a choice consists of k steps of which the first can be made in n1
ways, the second can be made in n2 ways…, the kth can be made in nk ways, then the whole
b)
1st digit 2nd digit 3rd digit 4th digit
5 4 3 2
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 4 ways.
3. Selecting the 3rd digit, this can be made in 3 ways.
4. Selecting the 4th digit, this can be made in 2 ways.
5*4*3*2=120 different cards are possible
Permutation: An arrangement of n objects in a specified order is called permutation of the
objects.
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! n * (n 1) * (n 2) * .....* 3 * 2 *1
Page 47 of 92
2. The arrangement of n objects in a specified order using r objects at a time is called the
n!
is n Pr
( n r )!
3. The number of permutations of n objects in which k1 are alike k2 are alike ---- etc is
n!
n Pr
k1!*k 2 * ...* k n
Example:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“CORRECTION”?
Solutions:
1.
a)
Here n 4, there are four disnict object
There are 4! 24 permutations.
Here n 4, r2
b) 4! 24
There are 4 P2 12 permutations.
(4 2)! 2
2.
Here n 10
Of which 2 are C , 2 are O, 2 are R ,1E ,1T ,1I ,1N
K1 2, k 2 2, k3 2, k 4 k5 k 6 k 7 1
U sin g the 3rd rule of permutatio n , there are
10!
453600 permutatio ns.
2!*2!*2!*1!*1!*1!*1!
Combination
A selection of objects with-out regard to order is called combination.
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two
letters.
Page 48 of 92
Solutions:
Permutation Combination
AB BC
AB BA CA DA
AC BD
AC BC CB DB
AD DC
AD BD CD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination Rule
n n!
r (n r )!*r!
Examples:
1. In how many ways a committee of 5 people be chosen out of 9 people?
Solutions:
n9 , r 5
n n! 9!
r
( n r )!*r! 4!*5! 126 ways
2. Among 15 clocks there are two defectives .In how many ways can an inspector chose three
of the clocks for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions:
n 15 of which 2 are defective and 13 are non defective.
r 3
a) If there is no restriction select three clocks from 15 clocks and this can be done in :
Page 49 of 92
n 15 , r 3
n n! 15!
r
( n r )!*r! 12!*3! 455 ways
b) None of the defective clocks is included.
This is equivalent to zero defective and three non-defectives, which can be done in:
2 13
0 *
286 ways.
3
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non-defectives, which can be done in:
2 13
1 *
156 ways.
2
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be done in:
2 13
2 *
13 ways.
3
3.6 Definitions of probability
Probability is used to quantify the likelihood, or chance, that an outcome of a random experiment
will occur. A probability 0 indicates an outcome will not occur. A probability of 1 indicates an
outcome will occur with certainty.
Subjective probability definition is defining the probability based on degree of belief that the
outcome will occur. Different individuals will no doubt assign different probabilities to the same
outcomes.
Page 50 of 92
N A No. of outcomes favourableto A n( A)
P( A)
N Total numberof outcomes n( S )
Examples:
1. A fair die is tossed once. What is the probability of getting
A).Number 4? B) An odd number? C).An even number? D).Number 8?
Solutions: First identify the sample space, say S
S 1, 2, 3, 4, 5, 6
N n( S ) 6
a) Let A be the event of number 4
A 4
N A n( A) 1
n( A)
P ( A) 1 6
n( S )
b) Let A be the event of odd numbers
A 1,3,5
N A n( A) 3
n( A)
P ( A) 3 6 0.5
n( S )
c) Let A be the event of even numbers
A 2,4,6
N A n( A) 3
n( A)
P( A) 3 6 0.5
n( S )
d) Let A be the event of number 8
A Ø
N A n( A) 0
n( A)
P ( A) 0 60
n( S )
Page 51 of 92
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of this
candles are selected at random, what is the probability
A). All will be defective. B). 6 will be non-defective C). All will be non-defective
Solutions:
80
Total selection
10
N n( S )
a) Let A be the event that all will be defective.
30 50
Total way in which A occur 10
*
N A n( A)
0
30 50
10
* 0
P ( A)
n( A)
0.00001825
n( S ) 80
10
b) Let A be the event that 6 will be non-defective.
30 50
Total way in which A occur 4 *
N A n( A)
6
30 50
4 *
6 0.265
n( A)
P ( A)
n( S ) 80
10
c) Let A be the event that all will be non-defective.
30 50
Total way in which A occur 0 *
N A n( A)
10
30 50
0 * 10
P ( A)
n( A)
0.00624
n( S ) 80
10
Exercises:
1. What is the probability that a waitress will refuse to serve alcoholic beverages to only
three minors if she randomly checks the I.D’s of five students from among ten students of
which four are not of legal age?
Short coming of the classical approach:
This approach is not applicable when:
- The total number of outcomes is infinite.
- Outcomes are not equally likely.
Page 52 of 92
The Frequentist Approach
This is based on the relative frequencies of outcomes belonging to an event.
Definition: The probability of an event A is the proportion of outcomes favourable to A in the
long run when the experiment is repeated under same condition.
NA
P ( A) lim
N N
Example: If records show that 60 out of 100,000 bulbs produced are defective. What is
the probability of a newly produced bulb to be defective?
Solution: Let A be the event that the newly produced bulb is defective.
NA 60
P ( A) lim 0.0006
N N 100,000
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event A a real
number called the probability of A satisfies the following properties called axioms of probability or
postulates of probability.
1. P( A) 0
2. P( S ) 1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e.
P( A B) P( A) P( B)
4. P( A' ) 1 P( A)
5. 0 P( A) 1
6. P(ø) =0, ø is the impossible event.
In general p( A B) p( A) p( B) p( A B)
Page 53 of 92
Chapter Four
4 Conditional Probability and Independence
4.1 Conditional Probability
The conditional probability of an event B given an event A, denoted as ( ⁄ ), is
( )
( ⁄ ) ( ) .
( )
This definition can be understood in a special case in which all outcomes of a random experiment
are equally likely. If there are n total outcomes,
Example1: A day’s production of 850 manufactured parts contains 50 parts that do not meet
customer requirements. Two parts are selected randomly without replacement from the batch.
What is the probability that the second part is defective given that the first part is defective?
Solution: Let A denote the event that the first part selected is defective, and let B denote the
event that the second part selected is defective. The probability needed can be expressed
as ( ⁄ ). If the first part is defective, prior to selecting the second part, the batch contains 849
parts, of which 49 are defective, therefore
( ⁄ ) ⁄
Example2: Continuing the previous example, if three parts are selected at random, what is the
probability that the first two are defective and the third is not defective? This event can be
described in shorthand notation as simply P(ddn). We have
( )
The third term is obtained as follows. After the first two parts are selected, there are 848
remaining. Of the remaining parts, 800 are not defective. In this example, it is easy to obtain the
solution with a conditional probability for each selection.
Page 54 of 92
4.2 Multiplication theorem, Total probability theorem and Bayes’ Theorem
Multiplication Rule
( ) ( ⁄ ) ( ) ( ⁄ ) ( )
The multiplication rule is useful for determining the probability of an event that depends on other
events.
Example: The probability that an automobile battery subject to high engine compartment
temperature suffers low charging current is 0.7. The probability that a battery is subject to high
engine compartment temperature is 0.05. Let C denote the event that a battery suffers low
charging current, and let T denote the event that a battery is subject to high engine compartment
temperature. The probability that a battery is subject to low charging current and high engine
compartment temperature is
( ) ( ⁄ ) ( )
( ) ( ) ( ) ( ⁄ ) ( ) ( ⁄ ) ( )
Example: suppose that in semiconductor manufacturing the probability is 0.10 that a chip that is
subjected to high levels of contamination during manufacturing causes a product failure. The
probability is 0.005 that a chip that is not subjected to high contamination levels during
manufacturing causes a product failure. In a particular production run, 20% of the chips are
subject to high levels of contamination. What is the probability that a product using one of these
chips fails?
Let F denote the event that the product fails, and let H denote the event that the chip is exposed
to high levels of contamination. The requested probability is P(F), and the information provided
can be represented as
which can be interpreted as just the weighted average of the two probabilities of failure.
Page 55 of 92
The reasoning used to develop the above equation can be applied more generally. In the
development of the above equation we only used the two mutually exclusive A and . However
the fact that =S, the entire sample space, was important.
Let H denote the event that a chip is exposed to high levels of contamination
M denote the event that a chip is exposed to medium levels of contamination
L denote the event that a chip is exposed to low levels of contamination Then,
( ) ( ⁄ ) ( ) ( ⁄ ) ( ) ( ⁄ ) ( )
=0.10(0.20)+0.01(0.30)+0.001(0.50)=0.0235
Bayes’ Theorem
In some examples, we do not have a complete table of information. We might know one
conditional probability but would like to calculate a different one. Consider the following that
Continuing with the semiconductor manufacturing example, assume the following probabilities
for product failure subject to levels of contamination in manufacturing:
In this problem, we might ask the following: If the semiconductor chip in the product fails, what
is the probability that the chip was exposed to high levels of contamination? From the definition
of conditional probability,
( ) ( ⁄ ) ( ) ( ) ( ⁄ ) ( )
Now considering the second and last terms in the expression above, we can write
Page 56 of 92
( ⁄ ) ( )
( ⁄ ) ( )
( )
This is a useful result that enables us to solve for in terms of ( ⁄ ) in terms of ( ⁄ )
In general, if ( ) in the denominator of Equation 2-11 is written using the Total Probability
Rule in Equation 2-8, we obtain the following general result, which is known as Bayes’Theorem.
If are k mutually exclusive and exhaustive events and B is any event,
( ⁄ ) ( )
( ⁄ ) ( )
( ) ( ) ( ) ( ) ( ) ( )
Example: We can answer the question posed at the start of this section as follows: The
probability requested can be expressed as ( ⁄ ). Then,
( ⁄ ) ( ) ( )
( ⁄ )
( )
Two events are independent if any one of the following equivalent statements is true:
1. ( ⁄ ) ( )
2. ( ⁄ ) ( )
3. ( ) ( ) ( )
Example: A day’s production of 850 manufactured parts contains 50 parts that do not meet
customer requirements. Two parts are selected at random, without replacement, from the batch.
a) Find the probability that the second part selected is defective
b) Find the probability that the second part selected is defective given that the first part is
defective.
Let A denote the event that the first part is defective, and let B denote the event that the second
part is defective. Indeed ( ⁄ ) and the ( )
When considering three or more events, we can extend the definition of independence with the
following general result.
Page 57 of 92
The events are independent if and only if for any subset of these events
,
( ) ( ) ( ) ( )
Example: The following circuit operates only if there is a path of functional devices from left to
right. The probability that each device functions is shown on the graph. Assume that devices fail
independently. What is the probability that the circuit operates?
Let T and B denote the events that the top and bottom devices operate, respectively. There is a
path if at least one device operates. The probability that the circuit operates is
( ) ,( )- ( )
a simple formula for the solution can be derived from the complements and From the
independence assumption,
( ) ( ) ( ) ( )
( )
Chapter Five
5 One-dimensional Random Variables
5.1 Random variable: definition and distribution function
Definition
A random variable is a function that assigns a real number to each outcome in the sample space
of a random experiment. A random variable is denoted by an uppercase letter such as X. After an
experiment is conducted, the measured value of the random variable is denoted by a lowercase
letter such as x=70 mill amperes.
Page 58 of 92
measurement is fractional, but it is still limited to discrete points on the real line. Whenever the
measurement is limited to discrete points on the real line, the random variable is said to be a
discrete random variable.
Examples of discrete random variables: number of scratches on a surface, proportion of defective
parts among 1000 tested, number of transmitted bits received in error.
Sometimes a measurement (such as current in a copper wire or length of a machined part) can
assume any value in an interval of real numbers (at least theoretically). Then arbitrary precision
in the measurement is possible. Of course, in practice, we might round off to the nearest tenth or
hundredth of a unit. The random variable that represents this measurement is said to be a
continuous random variable.
Examples of continuous random variables: electrical current, length, pressure, temperature, time,
voltage, weight
Page 59 of 92
Chapter Six
6 Functions of Random Variables
6.1 Equivalent events
Let be the set of all values of x such that if x is in , then ( ) is in A, and let be the set
of all values of y such that if y is in , then ( ) is in B we called and the equivalent
events of A and B.
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
X=x 10 11 12 13 14 15
Pmf= ( ) 0.0714 0.1429 0.2857 0.2143 0.2143 0.0714
CDF= ( ) 0.07143 0.14286 0.28572 0.21428 0.21428 0.07143
Page 60 of 92
Theorem1: Let X be a discrete random variable whose probability function is p(x). Suppose that
a discrete random variable U is defined in terms of X by ( ), where to each value of X
there corresponds one and only one value of U and conversely, so that ( ). Then the
probability function for U is given by
( ) , ( )-
Example2: let ( ) ; then find probability mass function and
3. ( ) ∫ ( )
Example3: Suppose X has pdf f(x) = 3 on [0, 1/3] (this means f(x) = 0 outside of [0, 1/3]).
Compute a) P (.1 ≤ X ≤ .2), b) P (.1 ≤ X ≤ 1) and c) the cumulative distribution function of x.
a) ( ) ∫ ∫ ( ) ( )
b) ( ) ( ) ∫ ∫ ( )
( )
c) For x in [0, 1/3] we have ( ) ∫ ( ) ∫
Since f(x) is 0 outside of [0,1/3] we know F (x) = P (X ≤ x) = 0 for x < 0 and F (x)
= 1 for x > 1/3. Putting this all together we have
Page 61 of 92
( ) {
Example4: Let X be a random variable with range [0, 1] and pdf ( ) . Find
a). ( ) ) ( ) ) ( ).
Solution:
a) To find the probability of ( ); we have to find the value of C?
If f(x) is pdf the probability at the given interval should be equal to one;
( ) ∫ ∫ ( ) , ( )
( ) ∫ , ∫ ( ⁄ )
b) ( ) ∫ ( ) ∫ ( ) ∫
c) . / ∫ ( ) ∫ ( ⁄ ) ( ⁄ )
Example5. Let X be the duration of a telephone call in minutes and suppose X has pdf :
( ) {
b) Find the probability that the call lasts less than 5 minutes. Answer. P(X < 5)=1−e−1/2 ≈0.393.
a) ( ) ∫ ∫ ( )
( ) ;
( )
b) ( ) ∫ ∫ ( )
Page 62 of 92
Exercise1: The resistance of an electrical component follows a p. d. f. given by
( ) {
Exercise2: The weekly demand for petrol at a local garage (in thousands of liters) is given by the
p. d. f.
( )( )
( ) {
The petrol tanks are filled to capacity of 940 liters every Monday. What is the probability the
garage runs out of petrol in a particular week?
Theorem 2: Let X be a continuous random variable with probability density f(x). Let us
define ( ), where ( ) as in Theorem 1. Then the probability density of U is given
by g(u) where
( )| | ( )| |
Or ( ) ( )| | , ( )-| ( )|
⁄
( ) {
Find the probability density function and distribution function for the random variable
( ).
Solution: ( ) ( ) ( ) , ( )-| ( )| ( )
( ) ( )
The pdf of u is; {
( ) ∫ ( ) ∫ ( ) , -∫
Page 63 of 92
= [, - , ( ) ( ) -]∫ ( ) , -
( ) {
( ) { ( )
Page 64 of 92
Chapter Seven
7 Two dimensional Random Variables
7.1 Two dimensional random variables
In science and real life, we are often interested in two or more random variables at the same time.
For example height and weight of building materials, type of bulb and its life, the amount of
water and the amount used for power generation. Such types of variables are known as two
dimensional random variables and there may have also common distribution function.
Suppose that X can assume any one of m values and Y can assume any one of
n values . Then the probability of the event that and is given by
( ) ( )
A joint probability function for X and Y can be represented by a joint probability table as in
below Table.
X Y Totals
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
Totals ( ) ( ) ( ) 1 Grand total
This can be written:
∑∑ ( )
Page 65 of 92
This is simply the statement that the total probability of all entries is 1. The grand total of 1 is
indicated in the lower right-hand corner of the table.
The joint distribution function or cdf of X and Y is defined by
( ) ( ) ∑∑ ( )
In above Table, P(x, y) is the sum of all entries for which and .
Example1: The joint probability function of two discrete random variables X and Y is given by
( ) ( ), where x and y can assume all integers such that ,
and ( ) otherwise.
(a) Find the value of the constant (b) Find ( ) (c) Find ( ).
b). ( ) . /
c). ( ) ( ) ( ) ( ) (
) ( ) ( )
( )
Continuous case
The case where both variables are continuous is obtained easily by analogy with the discrete case
on replacing sums by integrals. Thus the joint probability function for the random variables X
and Y (or, as it is more commonly called, the joint density function of X and Y) is defined by
1. ( )
2. ∫ ∫ ( )
Page 66 of 92
The probability that X lies between a and b while Y lies between c and d is given mathematically
by
( ) ∫ ∫ ( )
( ) ( ) ∫ ∫ ( )
( )
( )
i.e., the density function is obtained by differentiating the distribution function with respect to x
and y.
Example2: The joint density function of two continuous random variables X and Y is
Page 67 of 92
d. Find the joint distribution function of x and y.(exercise)
( ) ( ) ∑ ( )
For j = 1, 2, . . . , n, these are indicated by the entry totals in the extreme right-hand column or
margin of above Table. Similarly the probability that is obtained by adding all entries in
the column corresponding to and is given by
( ) ( ) ∑ ( )
For k = 1, 2, . . . , m, these are indicated by the entry totals in the bottom row or margin of
above Table. Because the probabilities ( ) and ( ) are obtained from the margins
of the table, we often refer to ( ) and ( ) as the marginal probability functions of X and Y,
respectively. It should also be noted that:
∑ ( ) , ∑ ( )
Page 68 of 92
Example3: ( ) ( ), where x and y can assume all integers such that
( ) ∑ ( ) ∑ ( )
( ) ∑ ( ) ∑ ( )
( ) ∑ ( ) ∑
( ) ∑ ( ) ∑
For continuous random variables x and y with the following joint distribution function,
( ) ( ) ∫ ∫ ( )
( ) ( ) ∫ ∫ ( )
( ) ( ) ∫ ∫ ( )
We call ( ) and ( ) the marginal distribution functions, or simply the distribution functions,
of X and Y, respectively. The derivatives of ( ) and ( ) with respect to x and y are then
called the marginal density functions, or simply the density functions, of X and Y and are given
by
( ) ∫ ( ) ( ) ∫ ( )
Example4: Find the marginal distribution functions (a) of X and (b) of Y for the following p.d.f.
( ) {
Page 69 of 92
(a) The marginal distribution function for X if is
Conditional distribution
Let X and Y be discrete random variables on the same probability space
The conditional pmf of X given Y = y is defined as:
( )
⁄ ( ⁄ ) ( )
( )
The conditional pmf of Y given X = x is defined as:
( )
⁄ ( ⁄ ) ( )
( )
Page 70 of 92
Example5: Let ( ) ( ), where x and y can assume all integers such that
( ) ∑ ( ) ∑ ( )
( ) ∑ ( ) ∑ ( )
( ) ( ) ( )
⁄ ( ⁄ )
( )
( ) ∑ ∑ ( )
⁄ ( ⁄ )
( ) ∑
( ) ( ) ( )
⁄ ( ⁄ )
( )
( ) ∑ ∑ ( )
⁄ ( ⁄ )
( ) ∑
X and Y are jointly continuous random variables if their joint cdf is continuous in both x and y
We then define the conditional pdf in the usual way as
( )
| ( | ) ( )
( )
The conditional pdf of Y given X = x is defined as:
( )
| ( | ) ( )
( )
Example6: Find the conditional distribution functions (a) of X and (b) of Y for the above
example. i.e | ( | ) | ( | )
( ) ( )
| ( | ) | ( | )
( ) ( )
Page 71 of 92
( )
( ) ∫∫ ( ) | ( | )
( )
( )
| ( | ) ( )
( ) {
Find (a) | ( | )
For , ( ) ∫ ( )
( )
And | ( | ) {
( )
Page 72 of 92
Example8: Let ( ) ( ), where x and y can assume all integers such that
( ) ∑∑ ( ) ( ) ( ) ∑ ∑ ∑∑ ( ) ( )
∑∑ ( ) ∑∑ ( ) ( )
( ) {
( ) ∫ ( ) ∫ ( ) ( ) ( )
Exercise1: Determine whether the random variables x and y are independent or not in the
following given.
( ) ( ) ( ) ( )
Page 73 of 92
7.5 Distributions of functions of two dimensional random variables
Theorem 3 Let X and Y be discrete random variables having joint probability function f(x, y).
Suppose that two discrete random variables U and V are defined in terms of X and Y by
( ), ( ), where to each pair of values of XandYthere corresponds one and
only one pair of values of U and V and conversely, so that ( ), ( ). Then
the joint probability function of U and V is given by
( ) , ( ) ( )-
Example11: Let ( ) ( ), where x and y can assume all integers such that
, .
Solution: ( ) ( ), ( ) ( )
( ) [ ( ) ( )] ( )
( )
| |
( )
Page 74 of 92
The Jacobian is given by
Page 75 of 92
Chapter Eight
8 Expectation
8.1 Expectation of a random variable
For a discrete random variable X with p.m.f. ( )
( ) ∑ ( )
Example1: The following are number of messages sent per hour over a computer network on 28
hours.
10, 14, 10, 11, 13, 11, 14, 13, 12, 14, 12, 15, 13, 12,
12, 11, 11, 12, 12, 13, 13, 13, 14, 14, 12, 14, 12, 15
Let X is a random variable that the number of messages sent per hour over a computer network
on 28 hours. Find expected value of a random variable X.
Solution: possible values of random variable X= {10, 11, 12, 13, 14, 15}
X=x 10 11 12 13 14 15
f 2 4 8 6 6 2
X=x 10 11 12 13 14 15
( ) 0.0714 0.1429 0.2857 0.2143 0.2143 0.0714
( ) 0.07143 0.14286 0.28572 0.21428 0.21428 0.07143
( ) ∑ ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( )
Example2: A lot containing 7 components is sampled by a quality inspector; the lot contains 4
good components and 3 defective components. A sample of 3 is taken by the inspector. Find the
expected value of the number of good components in this sample. Solution: Let X represent the
number of good components in the sample. The probability distribution of X is
( )( )
( )
( )
Page 76 of 92
Simple calculation yield ( ) , ( ) , ( ) , and
( ) . Therefore,
( ) ( )( ) ( )( ) ( )( ) ( )( )
Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good
components and 3 defective components, it will contain, on average, 1.7 good components.
For X a continuous random variable X with p.d.f. f(x)
( ) ∫ ( ) ∫ ( )( ) , ( ) ( )( )∫ ( )( )∫
( ) . / . /, ( )
Exercise1: Let X be the random variable that denotes the life in hours of a certain electronic
device. The probability density function is
( ) {
Find the expected life of this type of device and interpret it.
Example4. The density function of X is
Solution
( ) ∫ ( ) , ( ) ∫ ( )
( ) ∫, ( )
Page 77 of 92
8.2 Expectation of a function of a random variable
Now let us consider a new random variable g(x), which depends on X; that is, each value of g(x)
is determined by the value of X.
Let X be a discrete random variable with probability mass function ( ). The expected
value of the random variable g(x) is
, ( )- ∑ ( ) ( )
Example5: Suppose that the number of cars X that pass through a car wash between 4:00 P.M.
and 5:00 P.M. on any sunny Friday has the following probability distribution:
4 5 6 7 8 9
( ) 1/12 1/12 1/4 ¼ 1/6 1/6
Let g(X)=2X−1, represent the amount of money, in dollars, paid to the attendant by the manager.
Find the attendant’s expected earnings for this particular time period.
Solution: 2X−1=7, 9, 11, 13, 15, 17
, ( )- ( ) ∑( ) ( )
( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( )
For X a continuous random variable X with p.d.f. f(x) and for any real-valued function g(x)
( ) {
( )
( ) ∫ ∫( )
Page 78 of 92
We shall now extend our concept of mathematical expectation to the case of two random
variables X and Y with joint probability density function ( )
Let X be a discrete random variable with probability mass function ( ) and mean ̅ . The
variance of X is denoted by and it is defined by:
∑( ̅) ( )
If X is a continuous random variable with probability density function ( ) and mean ̅ . The
variance of X is defined by:
∫( ̅) ( )
The quantity x− ̅ is called the deviation of an observation from its mean. Since the deviations are
squared and then averaged, a will be much smaller for a set of x values that are close to ̅ than
it will be for a set of values that vary considerably from ̅ .
Example7: Let the random variable X represent the number of automobiles that are used for
official business purposes on any given workday. The probability distribution for company A is
1 2 3
( ) 0.3 0.4 0.3
And that for company B is
0 1 2 3 4
( ) 0.2 0.1 0.3 0.3 0.1
Show that the variance of the probability distribution for company B is greater than that for
company A.
Solution: For a company A, we find that, ( ) ( )( ) ( )( ) ( )( )
And then ∑ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
Page 79 of 92
For a company B,
We find that, ( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( ) And then
∑ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
Clearly, the variance of the number of automobiles that are used for official business purposes is
greater for company B than for company A.
An alternative and preferred formula for finding , which often simplifies the calculations, is
stated as follows:
The variance of a random variable X is: ( ) ̅
Example8: Let the random variable X represent the number of defective parts for a machine
when 3 parts are sampled from a production line and tested. The following is the probability
distribution of X.
0 1 2 3
( ) 0.51 0.38 0.10 0.01
Calculate the variance of X.
We first find that, ( ) ( )( ) ( )( ) ( )( ) ( )( )
And, ( ) ( )( ) ( )( ) ( )( ) ( )( )
Therefore, ( ) ̅ ( ) ,
Example9: The weekly demand for a drinking-water product, in thousands of liters, from a local
chain of efficiency stores is a continuous random variable X having the probability density
( )
( ) {
̅ ( ) ∫ ( ) ( ) ∫ ( )
At this point, the variance or standard deviation has meaning only when we compare two or
more distributions that have the same units of measurement. Therefore, we could compare the
variances of the distributions of contents, measured in liters, of bottles of orange juice from two
Page 80 of 92
companies, and the larger value would indicate the company whose product was more variable
or less uniform. It would not be meaningful to compare the variance of a distribution of heights
to the variance of a distribution of speed.
,( ̅ )( ̅)- ∑ ∑( ̅ )( ̅) ( )
For continuous random variables X and Y with joint probability density function ( ) the
covariance of X and Y is given by:
,( ̅ )( ̅)- ∫ ∫( ̅ )( ̅) ( )
The covariance between two random variables is a measure of the nature of the association
between the two. If large values of X often result in large values of Y or small values of X result
in small values of Y, positive ̅ will often result in positive ̅ and negative ̅ will
often result in negative ̅. Thus, the product ( ̅ )( ̅) will tend to be positive. On the
other hand, if large X values often result in small Y values, the product ( ̅ )( ̅) will tend
to be negative. The sign of the covariance indicates whether the relationship between two
dependent random variables is positive or negative. When X and Y are statistically independent,
it can be shown that the covariance is zero. The converse, however, is not generally true. Two
variables may have zero covariance and still not be statistically independent. Note that the
covariance only describes the linear relationship between two random variables. Therefore, if a
covariance between X and Y is zero, X and Y may have a nonlinear relationship, which means
that they are not necessarily independent.
In short the covariance of two random variables X and Y with means ̅ and ̅, respectively, is
given by:
( ) ̅̅
Example10: Let X be the number of blue refills and Y be the number of red refills. Two refills
for a ballpoint pen are selected at random from a certain box, and the following is the joint
probability distribution:
Page 81 of 92
( ) 0 1 2 ( )
0
1 0
2 0 0
( ) 1
( ) ( ) ( ) ( ) ∑ ∑( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )( )
( )( ) ( )( )
( )
( ) ∑( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )( ) ( )( ) ( )( )
( ) ∑( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )( ) ( )( ) ( )( )
( ) ( ) ( ) ( )( )
Page 82 of 92
Although the covariance between two random variables does provide information regarding the
nature of the relationship, the magnitude of does not indicate anything regarding the strength
of the relationship, since is not scale-free. Its magnitude will depend on the units used to
measure both X and Y. There is a scale-free version of the covariance called the correlation
coefficient that is used widely in statistics.
Definition: Let X and Y be random variables with covariance and standard deviations and
, respectively. The correlation coefficient of X and Y is denoted by defined by:
It should be clear to the reader that is free of the units of X and Y. The correlation coefficient
is always between . It assumes a value of zero when . Where there is an
exact linear dependency, say Y = a + bX,
Example11: Find the correlation coefficient between X and Y in the above Example.
Solution: Since
( ) ( )( ) ( )( ) ( )( )
And
( ) ( )( ) ( )( ) ( )( )
We obtain . / and . /
Example 12: Find the correlation coefficient of X and Y in Example 4.14. Solution: Because
( ) ( ) ( ) ( )
( ) ∫ ( ) ( ) ∫ ( )
Page 83 of 92
( ) ∫∫ ( )( )
( ) ∫ ( ) ( ) ∫ ( )
If the covariance between X and Y is positive, negative, or zero, the correlation between X and Y
is positive, negative, or zero, respectively.
The correlation just scales the covariance by the standard deviation of each variable.
Consequently, the correlation is a dimensionless quantity that can be used to compare the linear
relationships between pairs of variables in different units.
If the points in the joint probability distribution of X and Y that receive positive probability tend
to fall along a line of positive (or negative) slope, is near +1 (or -1). If equals +1 or -1, it
can be shown that the points in the joint probability distribution that receive positive probability
fall exactly along a straight line. Two random variables with nonzero correlation are said to be
correlated. Similar to covariance, the correlation is a measure of the linear relationship between
random variables.
Exercise2: Suppose that the random variable X has the following distribution: P(X = 1) = 0.2,
P(X= 2) = 0.6, P(X = 3) = 0.2. Let Y = 2X+5. That is, P(Y = 7) = 0.2, P(Y = 9) = 0.6, P(Y = 11)
= 0.2. Determine the correlation between X and Y.
Because X and Y are linearly related, . This can be verified by direct calculations: Try it.
Page 84 of 92
Chapter Nine
9 Common Probability distributions
9.1 Common Discrete Distributions and their Properties
9.1.1 Binomial distribution
A binomial experiment is a probability experiment that satisfies the following four
requirements called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.
Page 85 of 92
Solution:
Let X be the number of heads in tossing a fair coin four times
X ~ Bin (n 4, p 0.50)
n
P( X x) p x q n x , x 0,1,2,3,4
x
4
0.5 x 0.54 x
x
4
0.54
x
4
P( X 3) 0.54 0.25
3
2. Suppose that an examination consists of six true and false questions, and assume that a
student has no knowledge of the subject matter. The probability that the student will
guess the correct answer to the first question is 30%. Likewise, the probability of
guessing each of the remaining questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
Solution
Let X = the number of correct answers that the student gets.
X ~ Bin (n 6, p 0.30)
a) P( X 3) ?
n x n x
P( X x)
xp q , x 0,1,2,..6
6 6 x
x
x
0.3 0.7
P ( X 3) P ( X 4) P ( X 5) P ( X 6)
0.060 0.010 0.001
0.071
Page 86 of 92
Thus, we may conclude that if 30% of the exam questions are answered by guessing, the
probability is 0.071 (or 7.1%) that more than four of the questions are answered correctly by
the student.
b) P( X 2) ?
P( X 2) P( X 2) P( X 3) P( X 4) P( X 5) P( X 6)
0.324 0.185 0.060 0.010 0.001
0.58
c) P( X 3) ?
P( X 3) P( X 0) P( X 1) P( X 2) P( X 3)
0.118 0.303 0.324 0.185
0.93
d) P( X 5) ?
P( X 5) 1 P( X 5)
1 {P( X 5) P( X 6)}
1 (0.010 0.001)
0.989
Exercises:
1. Suppose that 4% of all TVs made by A&B Company in 2000 are defective. If eight of these
TVs are randomly selected from across the country and tested, what is the probability that
exactly three of them are defective? Assume that each TV is made independently of the
others.
2. An allergist claims that 45% of the patients she tests are allergic to some type of weed. What
is the probability that
a) Exactly 3 of her next 4 patients are allergic to weeds?
b) None of her next 4 patients are allergic to weeds?
3. Explain why the following experiments are not Binomial
Rolling a die until a 6 appears.
Asking 20 people how old they are.
Drawing 5 cards from a deck for a poker hand.
Page 87 of 92
Remark: If X is a binomial random variable with parameters n and p then
E ( X ) np , Var ( X ) npq
9.1.2 Poisson distribution
Experiments yielding numerical values of a random variable X, the number of outcomes
occurring during a given time interval or in a specified region, are called Poisson experiments.
The given time interval may be of any length, such as a minute, a day, a week, a month, or even
a year. For example, a Poisson experiment can generate observations for the random variable X
representing the number of telephone calls received per hour by an office, the number of days
school is closed due to snow during the winter, or the number of games postponed due to rain
during a baseball season. The specified region could be a line segment, an area, a volume, or
perhaps a piece of material. In such instances, X might represent the number of field mice per
acre, the number of bacteria in a given culture, or the number of typing errors per page. A
Poisson experiment is derived from the Poisson process and possesses the following properties.
2. The probability that a single outcome will occur during a very short time interval or in a small
region is proportional to the length of the time interval or the size of the region and does not
depend on the number of outcomes occurring outside this time interval or region.
3. The probability that more than one outcome will occur in such a short time interval or fall in
such a small region is negligible. The number X of outcomes occurring during a Poisson
experiment is called a Poisson random variable, and its probability distribution is called the
Poisson distribution. The mean number of outcomes is computed from μ = λt, where t is the
specific “time,” “distance,” “area,” or “volume” of interest. Since the probabilities depend on λ,
the rate of occurrence of outcomes, we shall denote them by p(x;λt). The derivation of the
formula for p(x;λt), based on the three properties of a Poisson process listed above, is beyond the
scope of this book. The following formula is used for computing Poisson probabilities.
Page 88 of 92
Poisson distribution
The probability distribution of the Poisson random variable X, representing the number of
outcomes occurring in a given time interval or specified region denoted by t, is
Example 5.18: Ten is the average number of oil tankers arriving each day at a certain port. The
facilities at the port can handle at most 15 tankers per day. What is the probability that on a given
day tankers have to be turned away? Solution: Let X be the number of tankers arriving each day.
Then, using Table A.2, we have
Like the binomial distribution, the Poisson distribution is used for quality control, quality
assurance, and acceptance sampling. In addition, certain important continuous distributions used
in reliability theory and queuing theory depend on the Poisson process. Some of these
distributions are discussed and developed in Chapter 6. The following theorem concerning the
Poisson random variable is given in Appendix A.25.
Page 89 of 92
The density function of the continuous uniform random variable X on the interval [A, B] is
The density function forms a rectangle with base B−A and constant height 1 B−A
. As a result, the uniform distribution is often called the rectangular distribution. Note, however,
that the interval may not always be closed: [A,B]. It can be (A,B) as well. The density function
for a uniform random variable on the interval [1, 3] is shown in Figure 6.1. Probabilities are
simple to calculate for the uniform distribution because of the simple nature of the density
function. However, note that the application of this distribution is based on the assumption that
the probability of falling in an interval of fixed length within [A, B] is constant.
Example 6.1: Suppose that a large conference room at a certain company can be reserved for no
more than 4 hours. Both long and short conferences occur quite often. In fact, it can be assumed
that the length X of a conference has a uniform distribution on the interval [0, 4].
(a) What is the probability density function?
(b) What is the probability that any given conference lasts at least 3 hours?
Solution: (a) The appropriate density function for the uniformly distributed random variable X in
this situation is
Page 90 of 92
approximated by a normal distribution. In 1733, Abraham DeMoivre developed the
mathematical equation of the normal curve. It provided a basis from which much of the theory of
inductive statistics is founded. The normal distribution is often referred to as the Gaussian
distribution, in honor of Karl Friedrich Gauss
rements of the same quantity. A continuous random variable X having the bell-shaped
distribution of Figure 6.2 is called a normal random variable. The mathematical equation for the
probability distribution of the normal variable depends on the two parameters μ and σ, its mean
and standard deviation, respectively. Hence, we denote the values of the density of X by
n(x;μ,σ).
Page 91 of 92
9.2.3 Exponential distribution
Page 92 of 92