Contents
1 Introduction and Definition of Statistics
Type and Application of Statistics
Variables and Scale of Measurement
2 Methods of Data Collection and Organization
Type and Sources of Data
Methods of Data Collection
Methods of Data Organization
Methods of Data Presentation
3 Measures of Central Tendency
Types of Measures of Central Tendency
4 Measures of Variation
5 Types of Measures of Variations
6 Theory of Probability
Definition and Some Basic Concepts of Probability
Definition of Statistics
• The term ’statistics’ is derived from the Latin word status, meaning
state, and historically statistics referred to the display of facts and
figures relating to the demography of states or countries
• Currently Defined in plural and Singular senses
• Plural sense: Statistics are collection of facts (figures)
• Examples: figures on sales, employment or unemployment, accident,
weather, death, education, etc.
• But not all numerical data are statistics.
Definition of Statistics..
In order for the numerical data to be identified as statistics:
• It should be aggregate of facts
• It should be affected by multiple causes but not the outcome of a
single cause.
• Should be numerically expressed
• The data should be collected in a systematic manner for
predetermined purpose
• It enumerated or estimated according to reasonable standard of
accuracy.
Singular sense
Statistics is the science that deals with the methods of data collection,
organization, presentation, analysis and interpretation of data. According
to this definition statistical investigation have five stages
1 Collection of Data: Process of obtaining data
2 Organization of Data: Making ready for clear understanding by
editing,classifying and tabulation
3 Presentation of Data: Visualizing organized data diagrammatically
or graphically
4 Analysis of Data: Summarizing data to reach on conclusion about
give problems
5 Interpretation of Data: process of drawing conclusion on the analyzed
results
Classification of Statistics
Based on scope of decision making statistics can be
1 Descriptive Statistics: used to organize and summarize masses of
data using measures of summary statistics. It does not go beyond
summary
2 Inferential Statistics: Making generalization about a population
based on the sample results using probability theory application
Application of Statistics
• Statistics is applied in almost all fields of human endeavor
• In Scientific Research:From the beginning of the design up to final
interpretation of the results
• In Industry: Help to check whether a product satisfies a given
standard
• In Business: To forecast future demand and profits of the business
• In Medicine: For drug development design and identification of
different health problems though research
Uses of Statistics
• To reduce and summarize masses of data: Using summary statistics
or diagrams or graphs
• To facilitate comparison: Uses averages, percentages, ratios,etc.
• To determining functional relationships between two or more
[Link] measures of association
• To formulate and testing hypotheses: Bases on test statistics and
testing procedures
• For forecasting: Using statistical models
Limitation of Statistics
• Does not deal with a single observation: Due to aggregates of facts
• Not applicable to qualitative opinion since it focused on
quantifications
• Statistical results are true on average
• Statistics are liable to be misused or misinterpreted
Important terms of Statistics
• Variable: Is any phenomena or an attribute that can assume
different values
• Data: Observation or measurement obtained on a given variables
• Populations: totality of all objects under study possessing certain
common characteristics
• Sample: subset(portion) of the population selected for the purpose of
investigations
• Sampling: is the procedure of obtaining sample using statistical
techniques
• Parameter: Population value to describe characteristics of a given
population
• Statistic: sample value to characterize given sample
Variables and its Types
Variable can be:
• Qualitative variables: Variable assuming category values (not
expressed in Numeric value)
Example: Gender, Religion, Color of automobile, educational level
• Quantitative variables: Variables assuming numerical values
Example: Height, Family size, Weight, etc.
• Quantitative Variable can be also Discrete or continuous
• Discrete variable: assumed distinct countable values. [Link] size,
Number of children in a family, etc.
• Continuous variable: assumes measurement values with give
measurement units. [Link]: Height, Weight, Time,
Temperature, etc.
Scale of Measurement of Variable
Use to know the information contained in given variables used to identify
its type, values and mathematical operation used over. The four scales of
measurements are:
1 Nominal: Reflection of categories. Eg. Gender, ethnicity
2 Ordinal: Reflection of categories with ordering of the categorized
values. Eg. Academic Rank, Grade letters, Economic status, Health
status
3 Interval: Reflection of quantitative variable. There is no true zero
E.g. IQ, temperature measurement in degree Celsius
4 Ratio: The scale of quantitative variable. There is true zero for this
scales E.g. Age, weight, height,... measurements
Data types
Based on its nature data can be:
• Qualitative: Expressed in terms of categories obtained based on
qualitative variable
Examples: Data on gender, religion, economic status, ethnicity of
subjects under investigation
• Quantitative: Expressed in numeric values obtained based on
quantitative variables
Examples: Data on age, weight, temperature, number of children a
family has, etc.
• Quantitative Data can be also discrete or continuous which is
obtained by measuring the values of discrete or continuous variables
Data types
Based on its source data can be:
• Primary data: Collected by investigator himself for the purpose of a
specific inquiry or study
• Secondary Data: Data collected by others and obtained from different
secondary sources
Based on time of data collection data can be:
• Cross-sectional data: is a set of observations taken at a point of time.
• Time series data: is a set of observations collected for a sequence of
time usually at equal intervals.
Methods of Data Collection
The first and foremost task in statistical investigation is data collection.
Before beginning data collection the investigator should address the
following four question
1 Why?: The purpose of data collection
2 What? : Defined nature of data to be collected
3 Where?: Source of data
4 How? Methods used for the collection
The data collection methods may be based on Questionnaire for the
survey or observational for the experimental studies
Questionnaire Methods
The questionnaire methods are based on personal interview using
different techniques or self administered questionnaire
• Secondary Data: Also extracted from different secondary sources by
checking its reliability,suitability and adequacy
Methods of Data Organization
After data collection was made the collected data must be organized into
some meaningful way. Organization of data involves
1 Data edition: Edition was mad for the purpose of completeness,
consistency, accuracy and homogeneity
2 Data Classification: Separation of data according to their similar
characteristics based different categorizations
3 Tabulation of Data: systematic arrangement of data in rows and
columns
Tabulated data have different contents such as title, caption,....
Frequency Distributions
The most convenient way of organizing numerical data is to construct a
frequency distribution. Frequency distribution is the organization of raw
data in table form, using classes and frequencies.
1 Categorical Frequency Distribution: Used to organize qualitative
data i.e. either nominal or ordinal
2 Un grouped Frequency Distribution: Used to organized discrete
quantitative data. single class resented using single numeric values
and its frequency
3 Grouped (Continuous) Frequency Distribution: several values of a
variable are grouped into one class. It has its own procedure of
construction
Examples on categorical
• Example: The blood type of 22 students is given below. Construct
categorical frequency distribution. A B B AB O A O O B AB B A B B
O A O AB A O O AB
Class (Blood type) Frequency (no of students)
A 5
B 6
AB 4
O 7
Total 22
Examples on Discrete
Number of children for 21 families is:
235433231043221114222
Construct ungrouped frequency distribution.
Class (no of children) Frequency (No of families)
0 1
1 4
2 7
3 5
4 3
5 1
Total 21
Grouped Frequency Distributions Construction
Construction of grouped frequency distribution is based on number of
class, and class width having its own class limits and class boundaries
• Number of class (k): Number of classes a given FD has
• Class Width: Size of a given class
• Class Limits: The lowest(LCL) and highest(UCL) value of a given
class
• Class boundaries: Class limits when there is no difference between
the first UCL and the next class LCL
• Class Mark: The mid point of a given class
• Class Frequency: The number of observation lying in a given class
• Relative Frequency: is the ratio of class frequency to total frequencies
• Cumulative frequency: The sum of frequency proceeding(LCF) or
succeeding(MCF) given class
• Unit of measurement (u): smallest difference between any two
observations
Steps of Frequency Distributions Construction
1 Arrange the data in ascending or descending order
2 Find u
3 Find range: R = Maximum - Minimum
4 Determine number of classes(k): k = 1 + 3.322 log(n)
R
5 Determine class width (w): w = k(1+3.322log(n))
6 Generate class limits as: LCL1 = minimum, LCLi = LCLi−1 + w and
UCLi = LCLi + (w − u)
7 Generate class boundaries: LCBi = LCLi − 21 u and
UCBi = UCLi + 21 u
8 Determine class frequency (fi ): Counting number of observation lying
each class
UCLi +LCLi UCBi +LCBi
9 Determine class mark (xi ): 2
= 2
10 Determine relative frequency: Pnfi
f
i=1 i
P1 Pk
11 Determine Cumulative frequency: LCFi = i=i fj and MCFi = i=i fi
Example
Consider mark of 50 students out of 40 which are: 16 21 26 24 11 17 25 26
13 27 24 26 3 27 23 24 15 22 22 12 22 29 18 22 28 25 7 17 22 28 19 23 23
22 3 19 13 31 23 28 24 9 20 33 30 23 20 8 21 24 and construct its grouped
frequency distribution.
• Arranging data: 3 3 7 8 9 11 12 13 13 15 16 17 17 18 19 19 20 20 21
21 22 22 22 22 22 22 23 23 23 23 23 24 24 24 24 24 25 25 26 26 26 27
27 28 28 28 29 30 31 33
• u = 7-8 =1, R= 33-3 = 30, k= 1 + 3.322 log(50) = 6.64 ≈ 7
30
• w= 6.64
= 4.5 ≈ 5 , w-u= 5-1 = 4
• LCL1 = 3, LCL2 = 3 + 5 = 8, ...LCL7 and
UCL1 = 3 + 4 = 7, UCL2 = 8 + 4 = 12, ..., UCL7
• LCBi = LCLi − 0.5 and UCBi = UCLi + 0.5
Constructed FD Table
Class limit Class boundaries xi fi RFi LCFi MCFi
3
3-7 2.5-7.5 5 3 50
3 50
4
8-12 7.5-12.5 10 4 50
7 47
6
13-17 12.5-17.5 15 6 50
13 43
13
18-22 17.5-22.5 20 13 50
26 37
17
23-27 22.5-27.5 25 17 50
43 24
6
28-32 27.5-32.5 30 6 50
49 7
1
33-37 32.5-37.5 35 1 50
50 1
Total 50 1
UCLi +LCLi UCBi +LCBi fi
• xi = 2
= 2
, RFi = n
• LCFi and MCFi is the sum of class frequencies
Methods of Data Presentation
Is the methods helps to visualize summarized data using different
diagrams or graphs based on the data nature
• Diagrams: Used to present qualitative data. Diagrams includes
pie,simple,multiple and component bar charts
• Pie and simple bar chart: Is used to present single variable
qualitative data. Pie chart is the circular data presentation.
• Multiple and component bar chats: used to present two or more
variable qualitative data. There difference is multiple bar uses
different bar for categories whereas multiple bar divided a single bar
in to number of categories based on category frequency
• In any bar charts Y-axis represents the frequency for each category
and X-axis represents the categories
Examples
• Consider the following data collected from 200 individuals and
organized on their marital status and gender. Based on this given
data sets present marital status by sex and marital status only using
diagrams
Marital Male Female Total
Single 90 10 100
Married 30 40 70
Others 5 25 30
Categorical Data Presentation Examples
Figure: Presentation of marital status vs sex and Marital status only
Methods of Quantitative Data Presentation
• Quantitative Data organized by un grouped and grouped frequency
distribution is also presented using different line graphs, histogram,
frequency polygon and other graphical techniques
• Histogram: based on class boundaries and class frequency
• Frequency polygon: connection class mark with class frequency
• Cumulative frequency polygon: Based on class mark with class
frequency
Examples
• Presentation of data collected on the height of 45 students
Figure: Hieght of Students presentations
Measures of Central Tendency
To give further conclusion about the collected data, organization and
presentation of data is not enough which needs further statistical
measures which summarized the data more. MCT is among these
summary measures to condense data in providing average of a given data
set. MCT also helps:
• To condense a mass of data with single numeric value
• To facilitate comparison among different groups
• To know the center value (average) of given data sets
Properties of Good Measures of Central Tendency
Measure of central tendency is good or satisfactory, if it characterized by:
• Based on all observations when calculated
• Not be affected by extreme values
• Should have a definite value
• Always exist.
• Easy to understand and calculate
• Capable of further algebraic treatment
Summation notations and its property
Summation is important to calculate mean and other statistical measures
as well
• Lex x be variable having successive values x1 , x2 , ..., xn . The
Pn
summation of x1 + x2 + ... + xn = i=1 xi
Pn
• x21 + x22 + ... + x2n = i=1 x2i
Pn
• x1 y1 + x2 y2 + . . . + xn yn = i=1 xi yi
1 1 1
Pn 1
• x1
+ x2
+ ... + xn
= i=1 xi
Rules of Summation
Pn Pn Pn
1 For two variables x and y: i=1 (xi ± yi ) = i=1 xi ± i=1 yi
Pn Pn
2 For any constant k: i=1 kxi = k i=1 xi
Pn
3 i=1 kx = nk
Pn Pn Pn
4 i=1 (xi − k)2 = i=1 x2i − 2k i=1 xi + nk
Types of Measures of Central Tendency
The three commonly used measures of central tendency are mean,
median and mode. Each of the MCT have their own properties
• Mean: The average of the a given observed data
• Median: The middle value of a given data. Divides a given data in to
two equal parts (more than 50% of our observation is below median
value whereas the remaining 50% is above the value)
• Mode: The most frequently observed values in a given data set (the
most frequently occurring value)
Types of Mean
• Arithmetic Mean: The sum of all observed values divided by number
Pn
i=1 xi
of observations: x̄ = n
, for raw data
Pk
fi xi
• x̄ = Pi=1
k f
, where xi and fi are class mark and frequency of ith class
i=1 i
respectively.
• Weighted Arithmetic Mean: Is the case when each observation have
Pn
w i xi
their own weight based on their importance x̄ = Pi=1
n w
, where wi
i=1 i
th
and xi are the i weigh and observed value respectively
• In simple arithmetic mean all observation considered to have equal
importance whereas in case of the weighted mean each of the
observation have their own weight based on their importance
Types of Mean...
• Geometric Mean(GM): The nth root of the product of all observed
values. It gives good average when our data is in ratios, proportion
and trends showing an increment or decrement in a given data
√ p
n
Qn
• For un organized data: GM= x1 ∗ x2 ∗ ... ∗ xn =
n
i=1 xi
Pk
qQ
k fi
• For grouped data: GM = i=1 ffi i=1 xi
• Harmonic Mean (HM): It give good average when the observed data
are expressed in terms of per unit time.
n n
• For un organized data: HM = 1 =
+ x1 +...+ x1 1
Pn
x1 2 n i=1 xi
Pn
• For grouped data: HM = i=1 fi
Pn fi
i=1 xi
• The weighted harmonic mean when each observation have their own
Pn
wi
weigh is given by: HM = Pni=1 wi
i=1 xi
Median
Median is the value which located at the center and considered as the
measure of location
• The calculation median formula depends on type of number
observations we have if n is odd the median is given by:
x̃ = ( n+1
2
)th observed value.
For n is even the median is given by:
( n2 )th value+( n2 +1)th value
x̃ = 2
n −LCF
x̃−1
• For grouped data: x̃ = LCBx̃ + ( 2
fx̃
)∗w
• This is done after identifying the median class. Median class is the
class containing ( n2 )th Observed value
• Mode (x̂): Most frequently observed value. For un grouped data the
one with greater frequency is the modal value
fx̂−f
x̂−1
• For grouped data: x̂ = LCBx̂ + ( f )∗w
x̂−fx̂−1 +fx̂−fx̂+1
Mode
Mode (x̂): Most frequently observed value. For un grouped data the one
with greater frequency is the modal value
fx̂−f
x̂−1
• For grouped data: x̂ = LCBx̂ + ( f )∗w
x̂−fx̂−1 +fx̂−fx̂+1
• The modal class is the class with larger frequency of the class
Exercise on MCT
Example 1: The heights of 7 students selected from a class are given
below in centimeter. 165, 160, 172, 168, 159, 170, 173. Calculate the
simple AM of heights.
x̄ = 165+160+172+168+159+170+173
7
= 1167
7
= 166.5 cm is the average height of
the students
• Example 2: Calculate the mean amount of yield of maize, based on
the following grouped data.
Yield (in kg) No of plots (fi ) Class mark (xi ) fi mi
171-179 3 175 525
180-188 7 184 1288
189-197 12 193 2316
198-206 9 202 1818
207-215 4 211 844
216-224 4 220 880
225-233 1 229 229
Total 40 7900
P
• x̄ = Pfi xi = 7900
= 197.5 kg per plot is the average yield
fi 40
Exercise on MCT...
Example 3: A student was registered Stat 281 and Math 261 with four
credit hours and Math 224, Phil 201, and Comp 201 with three credit
hours. If the student earned B grade for the courses Stat 281, Math
261,and Phil 201 and C grade for the remaining two course find the
average score of the students
4∗3+4∗3∗+3∗3+3∗2+3∗2 45
4+4+3+3+3
= 17 = 2.64 is the average score student can earn at
the end of the semester.
• Example 4: Assuming given epidemic was spreading at the rate of 1.5
and 2.67 in two successive days find average spread rate
√
GM = 1.5 ∗ 2.67 = 2.001 is the average spread of the epidemic
within two days
• Example 5 : If driver travels for 3 days at speed of 48 km per hr for
about 10 hrs, 40 km per hr for 12 hrs, 32 km per hr for 15 hrs
respectively. Find the average speed of the driver in 3 days
10+12+15
HM = 10 + 12 + 15
= 41.48km per hr is the average speed for 3 days
48 40 32
Exercise...
• Example 5: What is the median of 180, 201, 220, 191, 219, 209 and
220
• Sorted values 180, 191, 201, 209, 219, 220, 220. Since n= 7 is odd its
median is given by:
( 7+1
2
)th value = 4th value = 209
• Find median and mode of 62, 63, 64, 65, 66, 66, 68 and 78.
• Sorted values 62, 63, 64, 65, 66, 66, 68, 78. Since n= 8 is odd the
median is given by:
( n2 )th value+( n2 +1)th value 4th value+5th value 65+66
2
= 2
= 2
=65.5
• the modal value is 66
Exercise...
• Example: Consider example 2 data and find the median and modal
value of yield.
Yield (in kg) No of plots (fi ) Class mark (xi ) LCF
171-179 3 175 3
180-188 7 184 10
189-197 12 193 22
198-206 9 202 31
207-215 4 211 35
216-224 4 220 36
225-233 1 229 40
• The median class is the class which 9 40
2
)th value = 20th value lies i.e
the third class
• x̃ = 188.5 + ( 20−10
12
) ∗ 9 = 196
• The modal class is the class with greater frequency which is the third
class also
12−7
• x̂ = 188.5 + ( 12−7+12+9 ) ∗ 9 = 190.3
Properties of Arithmetic Mean
• If a constant k is added or subtracted from each value observations
i.e x̄new = x̄old ± k
• If each value of observations is multiplied by a constant k i.e
x̄new = kx̄old
Pn
• Deviation taken from the mean is zero i.e i=1 (xi − x̄) = 0
• If mean wrongly computed it is possible to reach on corrected one
based on wrong and corrected observations i.e
( n
P P P
i=1 xi )wrong − xwrong + xcorrected
x̄corrected = n
• It is also possible to have combine mean for different groups i.e
Pk
ni x̄i
x̄combined = Pi=1
k ni
i=1
Examples
• Example 1: There are 49 students in a certain department. Among
these 7 are seniors with average weight of 165 lbs, 9 are juniors with
average weight of 160 lbs, 13 are sophomores with average weight of
152 lbs and 20 freshman with average weight of 150 lbs. Find the
average weight of students in the department.
7∗165+9∗130+3∗152+20∗150
x̄combined = 7+9+13+20
= 93.28 lbs
• Example 2: The mean age of a group of 100 students was found to be
32.02 years. Later it was discovered that age of 57 was misread as
27. Find the correct mean.
32.02∗100+57−27 3232
x̄corrected = 100
= 100
= 32.32year
Other Measures of Locations
• The are quantiles which divides a given data sets in two more than
two equal parts which you can read further
Measures of Variation
• Two or more data sets may have the same mean and (or) median but
they may be quite different. This implies that MCT alone do not
provide enough information about the nature of the data.
Score of class A 30 30 30
Score of class B 29 30 31
Score of class C 15 30 45
Score of class D 5 30 55
• All the four data sets have mean 30 and median is also 30. This do
not implied the data sets are similar and does not give clear picture
about the nature of data
Objectives of Measures of Variation
• To have an idea about the reliability of the measures of central
tendency
• To compare two or more sets of data with regard to their variability
• To provide information about the structure of the data
• To pave way to the use of other statistical measures
Types of Measures of Variation
• Absolute Measures of Variation: A measure of actual amount of
variation of an item from a measure of central tendency and are
expressed in concrete units in which the data have been expressed
• Relative Measures of Variation: Is the quotient obtained by dividing
the absolute measure by a quantity in respect to which absolute
deviation has been computed. Used for making comparisons between
different distributions.
Types of Measures of Variation...
Absolute Measures Relative Measures
Range Coefficient of Range
Mean Deviation Coefficient Mean Deviation
Variance Coefficient of Variation
Standard Deviation Standard Scores
Range and Mean Deviation
• Range: Based only on maximum and minimum values R= L-S
R
• Coefficient of Range: L+S
• Mean deviation: The arithmetic mean of the absolute values of the
deviation from measures of central tendency.
Pn Pn Pn
|xi −x̄| |x −x̃| |x −x̂|
MD = i=1
n
, i=1n i , i=1n i
Pn Pn Pn
i=1 fi |xi −x̄| f |x −x̃| f |x −x̂|
orMD = n
, i=1 ni i , i=1 ni i
• Coefficient Mean Deviation: The ration of mean deviation to MCT
Variance and Standard Deviation
• Variance: The mean of the squared deviation taken from the
Pn 2 Pn 2
i=1 (xi −x̄) i=1 fi (xi −x̄)
arithmetic mean i.e s2 = n−1
or s2 = n−1
for grouped
data
• Standard deviation: is the square root of variance
• Coefficient of Variation: Is relative measures variation. Used to know
how the observations are heterogenous or homogeneous relative to
mean values
S
• CV = x̄
• % CV =CV*100%
Examples
• Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and
25. Compute the range, coefficient of range, mean deviation about
mean, mean deviation about median, coefficient of mean deviation
about mean, coefficient of mean deviation about median, variance,
standard deviation and CV
• R = max - min = 34 - 15 = 19, CR = max−min
max+min
= 3415
34+15
= 0.388
27+...+25 25+27
• x̄ = 8
= 25.5, x̃ = 2
= 26
|27−25.5|+...+|25−25.5| 34 4.25
• MDx̄ = 8
= 8
= 4.25, CMDx̄ = 25.5
= 0.1667,
|27−26|+...+|25−26| 32 4
MDx̃ = 8
= 8
= 4, CMDx̃ = 26
= 0.154
(27−25.5)2 +...+(25−25.5)2
√
• s2 = 7
= 34.57, S = 34.57 = 5.88, CV = 5.88
25.5
=
0.231
Examples
• Consider the following grouped data on score of students and find
measures of variations having x̄ = 25.64 and x̃ = 26.1
class xi fi |xi − x̄| fi |xi − x̄| |xi − x̃| fi |xi − x̃| fi (xi − x̄)2
10.5-14.5 12.5 4 13.14 52.56 13.6 54.4 690.6384
14.5-18.5 16.5 7 9.14 63.98 9.6 67.2 584.7772
18.5-22.5 20.5 8 5.14 41.12 5.6 44.8 211.3568
22.5-26.5 24.5 10 1.14 11.40 1.6 16.0 12.9960
26.5-30.5 28.5 12 2.86 34.32 2.4 28.8 98.1552
30.5-34.5 32.5 7 6.86 48.02 6.4 44.8 329.4172
34.5-38.5 36.5 8 10.86 86.88 10.4 83.2 943.5168
Total 56 338.28 339.2 2870.8576
38−11
• R = UCLlast − LCLfirst = 38 − 11 = 27, CR = 38+11
= 0.551
338.28 6.04
• MDx̄ = 56
= 6.04, CMDx̄ = 25.64 = 0.24,
339.2 6.06
MDx̃ = 56
= 6.06, CMD x̃ = 26.1
= 0.23
√
• s2 = 2870.8576
55
= 52.19, s = 52.19 = 7.22, CV = 7.22
25.5
= 0.283
Properties of variance
• If a constant is added (subtracted) to (from) each and every
observation, the standard deviation as well as the variance remains
the same.
• If each and every value is multiplied by a nonzero constant k, the
standard deviation is multiplied by k and the variance is multiplied
by k2 .
Theory of Probability
• Probability is a numerical description of uncertainty of a given
phenomena under certain condition
• For example individual choice of subjects for a given investigation are
assumed to be random
• We may sample a population at random and make inferences about
the population as a whole from the sample by using statistical
analysis
• In general probability is about the occurrence or none occurrence of a
given event resulted from the experiment
Review on Set Theory
• Union(Or): A set consisting all elements in A or B or both is called
S
the union set of A and B, i.e A B = {x : xA, xBorxboth}.
• Intersection (And): A set consisting all elements in both A and B i.e
T
A B = {x : xAandx}B.
S
• Complement (Not): is a set consisting all elements of that are not
in A; i.e., Ac = {x : x/A}.
T
• Disjoint Set: Sets A and B are disjoint set if A B = ∅.
Definition and Some Basic Concepts
• Experiment (ξ): Any trials that results defined outcomes
• Sample space(S): is the set of all possible outcomes of an experiment
• Example: Tossing a coin two times is an experiment and , S =
{HH, HT, TH, TT}, Rolling a die is an experiment and , S =
{1, 2, 3, 4, 5, 6}
• Event: is the subset of possible outcomes a given experiment with
defined characteristics.
• E.g The event of getting two head with through of fair of coin twice
E= {HH}, The event of getting even number in the though of dice E=
{2, 4, 6}
Type of events
• Simple Event: is an event consisting a single outcome
• Independent Event: is the events that the occurrence or none
occurrence of one event has no effect the other event
• Mutually Exclusive Events: Is the events having no outcome in
common (intersection)
• Complementary Event: mutually exclusive events are
complementary if there are no common elements between
themselves.
• null event: event with no outcome from a given experiment
• Exhaustive Events: events that their union forms the sample space
Counting Rules
Helps to know the number of possible outcome of a given experimental
outcomes or doing an experiment. The techniques are:
• Addition Rule: If an experiment has k procedures, where ith
procedures has ni alternatives and the procedures co not be
performed at the same time, total possible way of doing this is given
by: ni=1 ni
P
• Example: Suppose a lady wants to make journey from Harar to Dire
Dawa. If she can use either plane, bus, cycle, horse, and there are 3
flights, 4 buses, 2 cycles and 3 horses available. In how many
different ways can she make her journey?
From the given problem nf = 3, nb = 4, nc = 2 and nh = 3. So she has
nf + nb + nc + nh = 3 + 4 + 2 + 3 = 12 different ways to make her trip
from Harar to Dire Dawa.
Counting Rules
• Multiplication Rule: If there were k procedures of doing an
experiment when each of ith procedures can be done in ni possible
ways and the procedures are performed at the same. The, the total
possible way of doing this experiment is given by: ki=1 ni
Q
• Examples: Assume that an individual has 3 pairs of shoe, 4 trousers,
and 4 t-shirts in how many possible ways this guy an wear his
clothes. ns = 3, nt = 4, nts = 4, 3*4*4= 48 possible ways
• Permutation:Is used to know possible ways of making an
arrangement or ordering. The possible ways arranging n objects is
given by n!
• Example. Suppose a photographer must arrange 4 persons in a row
for a photograph. In how many different ways can the arrangement
be done? n= 4, 4!= 24 possible ways
Counting Rules...
n!
• Permutation of n objects by taking r of them is given by: (n−r)!
• Permutation of n objects in which n1 are alike, n2 are alike, ..., nr are
alike is given by: Qr n!
i=1
ni !
• Example: How many different permutations can be made from the
letters in the word:STATISTICS
n1 = n(s) = 3, n2 = n(t) = 3, n3 = n(a) = 1, n4 = n(i) = 2 and n5 = n(c) =
1.
10!
Thus, 3!∗3!1!∗2!∗1! =50400
• Combination: is the possible way of making selection
• Possible ways of selecting r objects from the n total objects are given
n n!
by: r
= r!∗(n−r)!
Counting Rules...
• The selection having k procedures from which r1 is selected from n1 ,
r2 is selected n2 up to rk which is selected from nk is given by: ki=1 ni
Q
ri
• Example 1: Example: In how many ways can student choose 3 books
12 12!
from a list of 12 different books? 3
= 3!(12−3)!
= 220
• Example: Out of 5 male workers and 7 female workers of some
factory a committee consisting 2 male and 3 female workers to be
formed. In how many ways can this done if
(a) all workers are eligible, 52 ∗ 73 = 350
5 6
(b) one particular female must be a member, 2
∗ 2
= 150
Approaches in Probability Definition
• The Classical Approach : Suppose there are N possible outcomes in
the sample space S of an experiment out which n are favorable to the
n
event E, then the probability that the event E is given by: P(E) = N
• Example 1: Consider an experiment of tossing a die. Then, what is
the probability that an odd number occurs S={1, 2, 3, 4, 5, 6}, E =
3 1
{1, 3, 5}, P(E)= 6
= 2
• The Empirical Approach (frequent): It is based on a relative
frequency of given frequency distributions. Given a frequency
distribution, the probability of an event being in a given class is given
by: Pfi
fi
• Subjective Approach: Based on an educated guess or experience or
evaluation of a problem.
Some Probability Rules or Axioms
Let S be a sample space of an experiment, and A, B be events defined on
experiment. The, the followings are axioms of probability
1 0 ≤ P(A) ≤ 1
2 P(S) = 1
3 P(Ac ) = 1 − P(A)
S T
4 P(A B) = P(A) + P(B) − P(A B)
5 P(∅) = 0, Where ∅ is impossible or null event
6 If A1 , A2 , ..., An are pairwise mutually exclusive events, then
P( ni=1 ) = ni=1 P(Ai )
S P
Exercise
• Example 1: A box of 20 candles consists of 5 defective and 15
non-defective candles. If 4 of these candles are selected at random,
what is the probability that
(a) all will be defective. Let A be an event of all candles are
(5)(15)
defective.P(A) = 4 20 0 = 0.001032
(4)
(b) 3 will be non-defective. Let B be an event of 3 candles are
(5)(15)
non-defective. 1 20 3 = 0.4696
(4)
• Example 2: An urn contains 6 white, 4 red and 9 black balls. If 3
balls are drawn at random, find the probability that at least one is
white. Let W is the event that at least one drawn ball is white
(6)∗(13) (6)∗(13) (6)∗(13)
P(w) = 1 19 2 + 2 19 1 + 3 19 0 = 0.7048
(3) (3) (3)
Independence Probability
Two events are independent if the occurrence or non occurrence of one
event do not influences the others.
T
• Event A and B are said to be independent if P(A B) = P(A)*P(B)
Conditional Probability
Two events are independent if the occurrence or non occurrence of one
event influences the others. The conditional probabilities event A and B
are given by:
• Conditional probability of A given that event B has already occurred,
T
P(A B)
P(A/B) = P(B)
• Conditional probability of B given that event A has already occurred,
T
P(A B)
P(B/A) = P(A)
• For mutually exclusive events A1 andA2 ,
S
P(A1 /B A2 /B) = P(A1 /B) + P(A2 /B)
• For pairwise mutually exclusive events, Ai ,
P( ni=1 Ai /B) = ni=1 P(Ai /B)
S P
Examples
• If the probability that a research project will be well planned is 0.6,
and the probability that it will be well planned and well executed is
0.54. Then, what is the probability that it will be
(a) well executed given that it is well planned. Let D and E be an
events of the research project is well planned and well executed
T
respectively. Then P(D) = 0.6 and P(D E) = 0.54.
T
P(E D) 0.54
P(E/D) = P(D)
= 0.6
= 0.9
• (b) will not be well executed given that it is well planned.
P(Ec D)
T T
P(D)P(D E)
P(Ec /D) = P(D)
= P(D)
= 1 − P(E/D) = 0.1
Multiplicative rule and Law of Total Probability
• Notation: Conditional Probability of event A given that the event B
has occurred: P(A/B)
T
• The multiplicative rule of probability:P(A B) = P(A/B)P(B)
• The Law of Total Probability: Let A1 ;... ; Ak be mutually exclusive and
exhaustive events. Then for any event B, P(B)= ki=1 P(B/Ai )P(Ai )
P
• The events A1 ;...; Ak are said to be exhaustive if one of them must
S S S
occur, that is A1 A2 ... Ak = S
Bayes’ Theorem
• Let A1 ;...; Ak be mutually exclusive as well as exhaustive events with
P(Ai ) > 0 for i = 1; 2;...; k which partitioned any event B for P(B) > 0.
Then, the probability of jth event of Ai is obtained by Bayes’s Theory
which is based on multiplicative and total probability rule as:
T
P(Aj B) P(B/Aj )P(Aj )
• P(Aj /B) = P(B)
= Pk for j= 1,2, ..., k
i=1
P(B/Ai)P(Ai )
Bayes’ Theorem Example
• E.g: Microchips from a factory are sorted into three separate boxes.
Box 1 contains 25 microchips from shift 1, box 2 contains 35
microchips from shift 2, and box 3 contains 40 microchips from shift
3. There are 5, 10 and 5 defective microchips in the first, second and
third boxes, respectively. Let A denote the event that a defective
microchip is obtained and B1, B2 and B3 be the events of choosing
box 1, box 2 and box 3, respectively. All three boxes are equally likely
25 35
to be chosen. Given P(B1) = 100 , P(B2) = 100 , P(B1) =
40 5 10 5
100
, P(A/B1) = 25
, P(A/B2) = 35
, P(A/B3) = 40
• What is the probability of obtaining a defective microchip?
P3 5 25 10 35 5 40
P(A) = i=1 P(A/Bi )P(Bi ) = 25
∗ 100
+ 35
∗ 100
+ 40
∗ 100
= 0.2
• If we picked a defective microchip, what is the probability that is
5 ∗ 25
P(A/B1 ∗P(B1 )
from box 1? P(B1 /A) = P(A)
= 25 100
0.2
= 0.25
One Dimensional Random Variable
In previous section when have basic concepts of probability as well as
methods of calculating the probability of an event. This sections focuses
on calculating the probability of an event under somewhat more complex
conditions.
• It focuses how to define random variables over a given experiment
and summarize it possible values using probability models
• Random Variable: variable defined over a given random experiment.
A variable X which assumes real numbers to all possible values of a
sample space is called a random variable
Type of One Dimensional Random Variables
The type of random variable is based on the nature of the possible out
come of a given experiment
1 Discrete random variable: A variable which assumes infinite or finite
countable values from defined experiment
Example 1: The number of heads from the experiment of tossing a
coin two times. S = {HH, HT, TH, TT}, X ={0, 1, 2}
2 continuous random variable: Variable assuming infinite number of
any real number between defined points based on defined
experiment.
Examples: Life of light bulbs under investigation, time taken for
recovery after undergone surgery,etc..
Type of One Dimensional Random Variables
The type of random variable is based on the nature of the possible out
come of a given experiment
1 Discrete random variable: A variable which assumes infinite or finite
countable values from defined experiment
Example 1: The number of heads from the experiment of tossing a
coin two times. S = {HH, HT, TH, TT}, X ={0, 1, 2}
2 Continuous random variable: Variable assuming infinite number of
any real number between defined points based on defined
experiment.
Examples: Life of light bulbs under investigation, time taken for
recovery after undergone surgery,etc..
Probability Distribution
Probability model for given random variables or a function defining all
possible values of given random variables with respective probabilities
• Probability distribution of discrete random variable is known to be
probability mass function
Let X be discrete random variable p(xi ) is said to be PMF of X, if it satisfy
the following conditions
1 0 ≤ p(xi ) ≤ 1
P
2 i p(xi ) = 1
P
3 p(Xi ≤ xi ) = i≤x p(xi )
Examples
• Example 1: Construct a probability distribution for getting heads in
an experiment of tossing a coin two times.
X 0 1 2
1 2 1
P(X = xi ) 4 4 4
• Based on the given probability mass function find P(X ≤ 1)
• Example 2: The probability distribution of a discrete random variable
Y is given by:
P(Y = y) = cy2 , y = 0, 1, 2, 3, 4. Then find the value of c.
Probability Density function
Probability Distribution of continuous random variable is known to be
probability density function
• Let X be continuous random variable, a function f (x) is said to be
PDF of X, if it satisfy the following conditions
1 f (x) ≥ 0, ∀x
R∞
2
−∞
f (x)dx = 1
Rx
3 p(X ≤ x) = ∞
f (x)dx
• Example: Let X be a continuous random variable and its pdf is given
by:
f(x) = 2x, for 0 < x < 1,
R1
• (a)Verify whether f(x) is a pdf or not: 0
2xdx = x2 |10 = 1
R 0.75
• Find p(0.5<x<0.75)= 0.5
2xdx = x2 |0.75 2 2
0.5 = 0.75 − 0.5 = 0.315
CUMULATIVE DISTRIBUTION FUNCTION
Is the cumulative of the probability values up to specified values of a
given random variable
• The cumulative distribution of discrete random variable X is defined
P
as: FX (x) = P(X ≤ x) = xi ≤x P(xi ) ⇔ p(xi ) = F(xi ) − F(xi−1 ), for i=
2,3,..and p(x1 ) = F(x1 )
Properties of cumulative distribution function
1 In the limiting cases, lim −∞FX (x) = 0, lim inf FX (x) = 1
2 FX is non-decreasing, that is a < b ⇒ F(a) < F(b)
3 For a < b, P(a < X ≤ b) = FX (b) − FX (a)
CUMULATIVE DISTRIBUTION FUNCTION...
Is the cumulative of the probability values up to specified values of a
given random variable
• The cumulative distribution of continuous random variable X is
Rx ∂
defined as: FX (x) = P(X ≤ x) = −∞
f (t)dt ⇔ f (x) = F (x),
∂x X
Properties of cumulative distribution function
1 In the limiting cases, lim −∞FX (x) = 0, lim inf FX (x) = 1
2 FX is non-decreasing, that is a < b ⇒ F(a) < F(b)
3 For a < b, P(a < X ≤ b) = FX (b) − FX (a) =
P(a < X < b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a ≤ X ≤ b)
Expectation and variance of Random Variables
Expected value is the mean of random variable which is measure average
and its variance its used to measure the variability of random variables
values in relation to their average value.
P
• For a discrete random variable: E(X) = µ = ∀x xi p(xi ) and Variance
2 P 2
is :E(X − µ) ∀x (xi − µ) p(xi )
R∞
• For a given continuous random variable: E(X)= µ = −∞
xf (x)dx,
2 2
R∞ 2
Var(x) = σ = E(X − µ) = −∞
(x − µ) f (x)dx
• Standard deviation: Is the root of variance
Expectation and variance of Random Variables properties
• For any constant ’a’ we have: E(aX) = aE(X)
E(a) = a, E(X± a) = E(X)± a
R∞
• If g(x) is a function of random variable: E(g(x)) = −∞
g(x)f (x)dx,
P
when X is continuous and E(g(x)) = i≤xi g(xi )p(xi ), when X is
discrete random variable
• Var(X)=σ 2 = E((X − E(X))2 ) = E(X − µ)2 = E(X 2 ) − µ2
R∞
• E(X 2 ) = x2i p(xi ) for discrete case and E(X 2 ) = x2 f (x)dx
P
∀xi −∞
• Var(aX)= a2 σ 2 , Var(a) = 0
• Var(X±a) = Var(X)
Examples
A coin is tossed two times. Let X be the number of heads. Find the mean
value and the standard deviation of X.
X 0 1 2
1 2 1
P(X = xi ) 4 4 4
1 2 1
P
• E(X) = xi p(xi ) = 0 ∗ 4
+1∗ 4
+2∗ 4
=1
• Var(X) =E(X 2 ) = 02 ∗ 1
4
+ 12 ∗ 2
4
+ 22 ∗ 1
4
= 1.5
Var(X) = 1.5-1 = 0.5
√ √
• SD(X) = σ = σ2 = 0.5 = 0.707
Examples
• Suppose that X is a continuous random variable with pdf of
1 + x, if − 1 < x < 0
f (x) =
1 − x
if 0 < x < 1
• then find the mean value and variance of X.
R0 R1
• E(X) = −1
x(1 + x)dx + 0
X(1 − x)dx = 0
R0 R1
• Var(X) = E(X 2 ) − µ2 , −1
x2 (1 + x)dx + 0
x2 (1 − x)dx = 0.167
• σ 2 = E(X 2 ) − µ2 = 0.167 − 02 = 0.167
Two Dimensional Random variables
There are a situation when define two random variable over a given
experiment. Suppose X and Y are random variables on the probability
space (Ω, A, P(.)). Their joint probability distribution describe information
about their properties relative to each other which is defined over R2 each
taking values in R.
• Their joint probability distribution is given by
P(X = xi , Y = yi ) = P(xi , yi ) when they are discrete random variable
and fx,y (x, y) when they are continuous two dimensional one
• Their cumulative distribution function is
P(X ≤ xi , Y ≤ yi ) = P(Xi , Xi ) and FX,Y (X, Y) for continuous case
Joint and marginal probability mass function
• Let (X,Y) be two dimensional discrete random variable
P(X = xi , Y = yi ) = P(xi , yi ) is their joint probability distribution iff:
P
1 xi
P P(X = xi , Y = yi ) = 1, ∀xi , yi
yi
2 0 ≤ P(X = xi , Y = yi ) ≤ 1
The marginal probability distribution of X and Y which describe marginal
distribution is given by:
P
• P(xi ) = yi P(X = xi , Y = yi ) is the marginal of X
P
• P(yi ) = xi P(X = xi , Y = yi ) is the marginal of Y
Joint and marginal probability mass function
• Let (X,Y) be two dimensional discrete random variable
P(X = xi , Y = yi ) = P(xi , yi ) is their joint probability distribution iff:
P
1 xi
P P(X = xi , Y = yi ) = 1, ∀xi , yi
yi
2 0 ≤ P(X = xi , Y = yi ) ≤ 1
The marginal probability distribution of X and Y which describe marginal
distribution is given by:
P
• P(xi ) = yi P(X = xi , Y = yi ) is the marginal of X
P
• P(yi ) = xi P(X = xi , Y = yi ) is the marginal of Y
Joint and marginal density functions
• Let (X,Y) be two dimensional continuous random variable a function
fx,y (x, y) is their joint probability density function iff:
R∞ R∞
1
−∞ −∞
f (x, y)dxdy = 1, ∀x, y
2 f (x, y) ≥ 0
The marginal density function of X and Y which describe marginal
distribution is given by:
R∞
• f (x) = −∞
f (x, y)dy is the marginal of X
R∞
• f (y) = −∞
f (x, y)dx is the marginal of Y
Conditional Distributions
• The conditional probability mass function of X and y is given by:
p(xi ,yi )
P(xi /yi ) = p(yi )
p(xi ,yi )
P(yi /xi ) = p(xi )
• The conditional density function of X and Y is given by:
f (x,y)
f (x/y) = f (y)
f (x,y)
f (y/x) = f (x)
• Two dimensional random variable (X,y) is said to be independent iff:
P(xi , yi ) = P(xi )P(yi )∀xi , yi for discrete case
f (x, y) = f (x)f (y), ∀x, y for continuous case
Examples
• A company produces two types of compressors, grade A and grade B.
Let X denote the number of grade A compressors produced on a given
day. Let Y denote the number of grade B compressors produced on
the same day. Suppose that the joint probability mass function is
given by:
Y
p(xi ; yi ) 0 1 p(xi )
0 0.1 0.3 0.4
x 1 0.2 0.1 0.3
2 0.2 0.1 0.3
p(yi ) 0.5 0.5 1
• Find
P(X < 1, Y ≤ 1) = p(X = 0, Y = 0)+P(X = 0, Y = 1) = 0.1+0.3 = 0.4
• Find P(X ≤ 1/Y < 1) =
P(X≤1,Y<1) p(X=0,Y=0)+P(X=1,Y=0) 0.1+0.2 3
P(Y<1)
= P(Y=0)
= 0.5
= 5
• Find the marginal probability distribution
• Find the conditional distribution
Examples
• Conditional Probability mass function of of X given Y
Y
p(xi /yi ) 0 1
0 0.2 0.6
x 1 0.4 0.2
2 0.4 0.2
• The conditional probability mass function of Y given X
Y
p(xi /yi ) 0 1
0 0.25 0.75
x 1 0.67 0.33
2 0.67 0.33
Example 2
Suppose an electronic circuit contains two transistors. Let X be the time
to failure of transistor 1 and let Y be the time to failure of transistor 2
having the following probability density function of:
4e−2(x+y) , if x > 0, y > 0
f (x, y) =
0 if otherwise
• Find marginal distribution of X and Y
R∞ R∞
f (x) = 0 4e−2(x+y) dy = 2x 4e−u du
2
= 2e−2x
R ∞ −2(x+y) R ∞ −u du
f (y) = 0 4e dx = 2y 4e 2 = 2e−2y
• Find the condition density of X and y:
4e−2(x+y)
f (y/x) = 2e−2x
= 2e−2y
4e−2(x+y)
f (x/y) = 2e−2y
= 2e−2x
R 2 R 2−x
P(x>1/x+y≤2) 4e−2(x+y) dxdy e−2 −3e−3
• P(x > 1/x + y ≤ 2) = = R12 R02−x =
P(x+y≤2) 4e−2(x+y) dydx 1−5e−4
0 0
Covariance and Correlation between Random Variables
Covariance is used to know the association between the two random
variables whereas the correlation is used to know the degree of
association between these two variables
• The covariance between the two random variables is given by:
Cov(X, Y) = σXy = P − µx )(Y − µy )] = E(XY) − µX µY ,
PE[(X
Where E(XY) = x y xi yi P(xi yi ) and
R∞ R∞
E(XY) = −∞ −∞ xyf (x, y)dxdy for two dimensional discrete and
continuous random variables respectively
• Positive covariance impieties positive association, negative
covariance value indicates negative association whereas zero
covariance value implies no association between random variable X
and Y
• The correlation between Random variable X and Y is given by:
Cov(X,Y) σXY
ρXY = √ = σX σY
Var(x)∗Var(Y)
• −1 ≤ ρXY ≤ 1
• ρXY ≈ 0 indicates weak or no association, ρXY ≈ ±0.5 indicates
moderate negative or positive association , and ρXY ≈ ±1 indicates
strong positive or negative association
Examples
• Consider example company produces two types of compressors whose
joint probability distribution is given as follow and calculate its
correlation between the number of compressors
Y
p(xi ; yi ) 0 1 p(xi )
0 0.1 0.3 0.4
x 1 0.2 0.1 0.3
2 0.2 0.1 0.3
p(yi ) 0.5 0.5 1
• E(XY)= 0*0*0.1+0*1*0.3+1*0*0.2+1*1*0.1+2*0*0.2+2*1*0.1= 0.3
• µx = 0*0.4+1*0.3+2*0.3 = 0.9, µy = 0*0.5+1*0.5=0.5
• σXY = E(XY) − µx ∗ µy = 0.3-0.9*0.5 = -0.15
• Var(X) = E(X 2 ) − µ2x , E(X 2 ) =
02 ∗ 0.4 + 12 ∗ 0.3 + 22 ∗ 0.3=1.5,σX2 = 1.5 − 0.92 = 0.69, Var(Y) =
σY2 = E(Y 2 )−µ2Y ,E(Y 2 ) = 02 ∗0.5+12 ∗0.5 = 0.5, σY2 = 0.5−0.52 = 0.25
σxy
• ρxy = = √ −0.15 = −0.36
σx σy 0.69∗0.25
Examples
• Consider the following joint density function of random variable
(X,Y)and calculate its correlation coefficients
3 x2 + y, if 0 < x < 1, 0 < y < 1
f (x, y) = 2
0 if otherwise
R1R1
• E(XY) = 0 0
xy( 23 x2 + y)dxdy = 34
96
R1 R1
• E(X) = 0
x( 32 x2 + 21 )dx = 58 , E(Y) = 0
y( 12 + y)dy = 7
12
• σXY = − 58 ∗ 12
34
96
7
= −26
96
= −0.01042
1 R1
• E(X 2 ) = 0 x2 ( 32 x2 + 12 )dx = 21 , E(Y 2 ) = 0 y2 ( 21 + y)dy = 5
R
30 12
• σx2 = 21
30
− ( 58 )2 = 0.3094, σy2 = 5
12
7 2
− ( 12 ) = 0.0764
• ρxy = √ −0.01042 = −0.068
0.3094∗0.0764
Common Discrete Probability Distributions
Though there are several discrete probability distribution which is used
to model the probability of a give discrete random variable this course
focuses on Binomial and Poisson distributions
• Binomial Probability Distribution: is discrete probability distribution
used to model the random variable defined over n fixed experiment
having two out comes for a single trial. In general to apply Binomial
distribution our experiment should be characterized by the following
for properties
1 The experiment should have a fixed number of trials(n)
2 The trials are independent
3 Each trial should results two out comes(success and failure)based on
event of interest
4 The probability of a success(p) is constant to all trials
Common Discrete Probability Distributions
Though there are several discrete probability distribution which is used
to model the probability of a give discrete random variable this course
focuses on Binomial and Poisson distributions
• Binomial Probability Distribution: is discrete probability distribution
used to model the random variable defined over n fixed experiment
having two out comes for a single trial. In general to apply Binomial
distribution our experiment should be characterized by the following
for properties
1 The experiment should have a fixed number of trials(n)
2 The trials are independent
3 Each trial should results two out comes(success and failure)based on
event of interest
4 The probability of a success(p) is constant to all trials
Binomial Distribution..
The probability distribution of random variable X defined over such
experiment is given by:
P(X = x) = nx px (1 − p)n−x , where x = the number of successes, , p = the
probability of a success on one trial,q = the probability of failure on one
trial (q = 1 âĹŠ p), and n = the number of trials
√
• E(X) = np , Var(X) = σ 2 = npq, and StanD= npq
• Example: A university found that 10% of its students withdraw
without completing the sophomore course. Assume that 20 students
registered for the course. Compute the probability
(a) exactly four will [Link] X be number of students who will
withdraw without completing the introductory statistics course.
P(X = 4) = 20
4 20−4
4
0.1 0.9 = 0.0898
(b) at most two will withdraw. P(X ≤ 2) = P(X = 0) + P(X =
1) + P(X = 2) = 20 0.1 0.9 + 20 0.1 0.9 + 20
0 20 1 19 2 18
0 1 2
0.1 0.9
Poisson Distribution
There are also a cases when our experiment deals with number of
accidents occurred, number of arrivals in specified period of time and the
like when the number of trials are unknown. In such circumstances we
used poisson distribution looking for the following assumptions
1 n is indefinite
2 Probability of the event to be happen is very rare
3 The trials are independent and there should be average number
events to be happen at given interval
If X is random variable defined over Poisson experiment its PMF is given
by:
e−λ ∗λx
P(X = x) = x!
, where x counts number of occurrence of event and λ
average number of events happens at given interval
E(X) = Var(X) = λ
Examples on Poisson Distribution
A student finds that the average number of amoeba in 10 ml of pond water
is 4. Find the probability that in 10 ml of water from that pond there are
1 exactly 5 amoeba. Let Y be the number of amoeba found in 10 ml
e−4 ∗45
pond water. P(Y = 5) = 5!
= 0.156
e−4 40
2 no amoeba.P(Y = 0) = 0!
= 0.0183
3 at least one amoeba.
P(Y ≥ 1) = ∞
P
i=1 P(Y = yi ) = 1 − P(Y < 1) = 1 − P(Y = 0) = 0.9817
4 Find mean and standard deviation for the number of amoeba in 10
ml pound of water
Common Continuous Distributions
• Though there are several Continuous Distributions this section
focuses on normal distribution which plays key role in modeling the
probability distribution of continuous random variable and in
statistical inferences. It is bell shaped characterized by its mean and
variance
Figure: Normal probability distribution Curve
Normal probability distribution
The probability density function of X having normal distribution is given
by:
−1 2
(x−µ)
f (x) = √ 1 e 2σ2 , ∞ < x < ∞, where µ = mean, σ = standard
2Πσ 2
deviation
• The distribution is characterized by its mean and variance
• its is also symmetric to its mean
• Have maximum point at its mean
• The standard deviation determines its flatness and wideness of the
normal curve.
Normal probability distribution...
To get the probability under the normal density one have to integrate
−1
(x−µ)2
√ 1
R
2
e 2σ2 dx which is very complex.
2Πσ
To avoid this complexity the normal distribution should be changed to
standardized normal distribution to use normality table for finding the
probability
• Transforming in to normality is subtracting from the given random
value the mean and dividing it by the standard deviation of normal
x−µ
density given by: z = σ
• After the transformation the property of the normal density and the
area under the curve of similar to that of standardized one
Properties of Standardized Normal Distribution
• It has mean zero and standard deviation 1
• The curved of standardized normal distribution its symmetric to its
mean or zero
• The total area under the curve is one
• The area to -z value is equal to the area to z based on symmetric
property
• The area from −∞ to 0 equal the area from 0 to ∞ which half of the
total (0.5)
Properties of Standardized Normal Distribution..
Figure: Standard Norma distribution Curve
• P(0 < z < 2.5) = 0.4938
• P(z > 1) = P(z > 0) âĹŠ P(0 < z < 1) = 0.5 âĹŠ 0.3413 = 0.1587
• P(z < 1) = P(z < 0) + P(0 < z < 1) = 0.5 + 0.3413 = 0.8413
Standardized Normal Distribution Table
Figure: Standard Norma distribution Table
Examples
The college boards, which are administered each year to many thousands
of high school students, are scored so as to yield a mean of 500 and a
standard deviation of 100. These scores are close to being normally
distributed. What percentage of the scores can be expected to satisfy each
condition?
• Greater than [Link] X be the score of students,
P(X > 600) = P( X−500
100
> 600−500
100
)
P(Z > 1) = P(Z > 0) − P(0 < Z < 1) = 0.5 − 0.3413 = 0.1587
• Between 450 and
600.P(450 < X < 600) = P( 450−500
100
< X−500
100
< 600−500
100
)
P(0.5 < Z < 1) = P(0.5 < Z < 0) + P(0 < Z < 1) = P(0 < Z <
0.5) + P(0 < z < 1)
= 0.1915 + 0.3413 = 0.5328
Simple Linear Regression Analysis
Pn
x y − n xi n
P P
n i=1 yi
• β̂ = Pn i i 2 i=1
i=1 Pn
n i=1 xi −( i=1 xi )2
• β̂0 = ȳ − β̂1 x̄
• The slope tell us the magnitude of the effect of the independent of the
response or how the average change in response is changed with unit
change in the independent variable
• Coefficient of Determination:measures the proportion or percentage
of the variation in the dependent variable explained by the
independent variable
Simple Linear Regression Analysis Examples
• Example: A researcher wants to find out if there is a relationship
between the heights of sons and the heights of their fathers. In other
words, do taller fathers have taller sons? The researcher took a
random sample of 6 fathers and their 6 sons. Their height in inches is
given below in an ordered array.
Father (x) 63 65 66 67 67 68
Son (y) 66 68 65 67 69 70
Pn Pn Pn Pn
• i=1xi yi = 26740, i=1 xi = 396, i=1 yi = 405, i=1 y2i =
27355, ni=1 x2i = 26152
P
6∗26740−396∗405 6∗26740−396∗405
• r= √ √ =0.597, β̂1 = √ = 0.625,
6∗26152−(396)2 ∗ 6∗27355−(405)2 2
6∗26152−(396)
β̂0 = 67.50.625 ∗ 66=26.25
• Estimated Model = ŷ = 26.25 + 0.625xi
Simple Linear Regression Analysis Examples
• Estimated correlation value indicates the moderate positive
association between father and sons height
• Estimated slope indicates for one inch increment in fathers height
with one unit, the average height of the son is increased by
0.625inches.
• Coefficient of determination: R2 = (r)2 = 0.5972 = 0.357 shows 35.7%
of variation in the dependent variable (son height) is accounted for by
the variation of the independent variable (father height).