1 Biostatistics Lecture Notes Part One

Introduction to Biostatistics
Wondimu Ayele(Msc, PhD fellow )

SP, AAU
January 2019
Objective
– Define statistics and its importance in different
discipline
– Define variable and data
– Describe types of data and measurement scales
– Organize and display data
– Define and calculate measures of central tendency

and measures of spread
Biostatistics -Notes WA , SPH AAU ,2016
• References
1. M. Pagano & K. Gauvereau: principles of Biostatistics
2. Colton T. : Statistics in Medicine
3. Bland M. : An Introduction to Medical Statistics
4. Daniel W. : Biostatistics: A Foundation for analysis in Health
Sciences
5. David S. Moor, G.P.McCable: Introduction to the practice of
Statistics
6. Kleinbaum, K.Muller: Applied Regression Analysis and other
Multivariate Methods
7. L. D. Fisher & G. Van Belle: Biostatistics
8. Kirkwood B. : Essentials of Medical Statistic
9. A. R. Feinstein: Principles of Medical statistics
10. R. G. Knapp & M. C. Miler: Clinical epidemiology and
biostatistics
11. D. J. Sheskin: Hand book of Parametric and
Nonparametric Statistical Procedure
12. Armitage P. & Berry G. : Statistical Methods in Medical
Research
13. P. S.R.S. Rao: Sampling methodologies with application
14. R.N.Forthofer & E. S. Lee: Introduction to Biostatistics

Introduction
• What is Statistics?
• Methods for collecting, organizing, presenting,
analyzing, & drawing of inferences about a body
of the data when only a part of the data is
observed.

WHY WE NEED STATISTICS
• To present the data in a concise and definite form.
Statistics helps in classifying and tabulating raw data for
processing and further tabulation for end users.
• To make it easy to understand complex and large data

• This is done by presenting the data in the form of tables,
graphs, diagrams etc. or by condensing the data with the
help of means, dispersion etc
.
• For comparison : Tables, measures of means and
dispersion can help in comparing different sets of data..

WHY WE NEED STATISTICS
• In measuring the magnitude of a phenomenon.
• Statistics has made it possible to count the population of a
country, the industrial growth, the Agricultural growth, the
educational level, Health status.
• Everything in medicine, be it research, diagnosis or
treatment depends on counting/measurement.
– High/ Low B.P??
– Pulse rate.
– Incidence of disease.
– Death rate.
– Enlargement of liver/ spleen

Statistics and Health
• Biostatistics
• Health statistics
• Medical Statistics
• Vital Statistics
• Not Mutually Exclusive terms
 what is Biostatistics ?
• An application of statistical method to biological
phenomena.

Why need biostatistics?
1. Main reason: handling variations
– Biological variation
• Attribute differ not only among individuals but also
within same individual over time
• Example: height, weight, blood pressure, eye color.
– Sample variation
• Biomedical research projects are usually carried out
on small numbers of study subjects

Why need to learn biostatistics?
2. Essential for scientific method of investigation
– Formulate hypothesis
– Design study to objectively test hypothesis
– Collect reliable and unbiased data
– Process and evaluate data rigorously
– Interpret and draw appropriate conclusions
3. Essential for understanding, appraisal and critique of
scientific literature

Examples of uses of biostatistics
• To define what is normal/ healthy in a population (Setting
limits of normality).
• To compare drug action –potency/efficacy
• Confirm association between two attributes: Cancer and

smoking or Socioeconomic status and malnutrition
• Usefulness of vaccines

Uses in Public Health Planning
• Recording of vital events
• Incidence/prevalence of disease.
• Leading causes of death/ morbidity in the community
• Demographic characteristics of a community.
• Health system research.

Application of Biostatistics
1. Genetically statistics
2. Numerical Taxonomy
3. Statistical Ecology
4. Statistical Ethnology
5. Forest menstruation
6. Forest and Agricultural yield table
7. Biomass estimation
8. Statistical environment management
9. Demography
10. Medical sciences
11. Biological variation and uncertainties
Limitation of statistics
• Statistics does not deal with individual measurements.
Since statistics deals with aggregates of facts, it can not
be used to study the changes that have taken place in
individual cases.
• Statistics cannot be used to study qualitative
phenomenon like morality, intelligence, beauty etc. as
these can not be quantified. However, it may be possible
to analyze such problems statistically by expressing them
numerically.
• Statistical results are true only on an average- The
conclusions obtained statistically are not universal truths.
They are true only under certain conditions. This is
because statistics as a science is less exact as compared
to the natural science.

Limitation of statistics
• Statistical data can be treated as approximations or

as estimates and not a precise measurement.
• Statistical results might lead to fallacious conclusions.
• Requires one who has a sound knowledge of
statistical methods can efficiently handle statistical
data.

Types of Statistics
Statistics
Probability
Sampling theory
Descriptive Statistics Inferential statistics
Measure of Measure of Test Estimation

Tabular Diagrammatic
Central Variability hypothesis Theory
representation representation
Tendency
Non Parametric Parametric Point Interval

test test estimation Estimation
Population & Sample
• Target population: A collection of items that have
something in common for which we wish to draw
conclusions at a particular time.
• Study Population: The specific population from which data
are collected
• Sample: A subset of a study population, about which
information is actually obtained.
• Generalizability is a two‐stage procedure: we want to able
to generalize from the sample to the study population and
then from the study population to the target population

Population and sample
• E.g.. In a study of the prevalence of HIV among Student in
Addis Ababa University, a random sample of all
pharmacy students in college of Health science of AAU
were included.
• Target population; all student in Addis Ababa University
• Study population; all student in college of Health science
of AAU
Sample; all Pharmacy student in Health science college of
AAU.
Sample
Study population
Target population
Parameter and Statistic
 Parameter: A descriptive measure computed
from the data of a population.
 Statistic: A descriptive measure computed from
the data of a sample.

Scales of measurement
• Clearly not all measurements are the same.
• Measuring an individuals weight is qualitatively different
from measuring their response to some treatment on a three
category of scale, “improved”, “stable”, “not improved”.
• Measuring scales are different according to the degree of
precision involved.
• There are four types of scales of measurement.

1. Nominal scale: uses names, labels, or symbols to assign
each measurement to one of a limited number of categories
that cannot be ordered.
Examples: Blood type, sex, race, marital status, Adolescence
stage, Color of cars.
2. Ordinal scale: assigns each measurement to one of a
limited number of categories that are ranked in terms of a
graded order.
• Examples: Patient status, Cancer stages, Socioeconomic
status, IQ of children.

3. Interval scale: assigns each measurement to one of an
unlimited number of categories that are equally spaced.
It has no true zero point.
Example: Temperature measured on Celsius or Fahrenheit
4. Ratio scale: measurement begins at a true zero point

and the scale has equal space.
Examples: Height, weight, blood pressure

• DATA: Collection of information, comprised either
individual or group.
Variables: A characteristic which takes different

values in different persons, places, or things.
Example:
Animals of the same species may differ in their Length,

weight, age, sex, Diastolic BP, heart rate, etc

Types of variable
Qualitative/ Categorical variable : records which
group or category an individual/observation belongs in;
classifies
• doesn’t make sense to perform arithmetic on this type of
variable
Example, gender, ethnic group, type of diagnosis as present or
absent, etc
Quantitative variable: Variable that has magnitude.
 A true numerical value; it indicates an amount; often
obtained from a measuring instrument;
 it makes sense to perform arithmetic on these types of
variables. E.g. Weight, Length, Age etc
Types of Variable
Discrete variable: It can only have a finite number
of values in any given interval.
– Indivisible units
– Restricted to whole numbers
– Can be counted
• Example.
– # of children in a family
– # of houses in a neighborhood
– # of patients discharged from the hospital on a given day

SUMMARY
Variable
Types
of Qualitative Quantitative
variables or categorical measurement
Nominal Ordinal Discrete Continuous

(not ordered) (ordered) (count data) (real-valued)
e.g. ethnic e.g. response e.g. # of e.g. height
group to treatment admissions
Measurement scales
Types of variable
Continuous variable: It can have an infinite number
of possible values in any given interval.
• Unlimited number of possible values
• Infinite number of values can fall b/n any 2
observed values
• No gaps between units
Example. time taken to solve a problem
height or weight, weight/Temperature of patients

Sources of data
Routinely kept records
– Hospital medical records, accounting records
Survey
– Mode of transportation used by patients to visit the
clinic.
Experiments
– Best strategies for maximizing patient compliance.
External sources.
– An already published data

Types of data
1. Primary source data: primary data are those data which are collected by
the investigator himself (herself) for the purpose of a specific goal or study.
Example: data gathered from interview, questionnaire, or field observation of the

investigator or researcher.
2. Secondary source data: when an investigator uses data which have already
been collected by others. Secondary sources can be individuals or agencies,
which supply data originally collected for other purposes by them or others.
• They are less expensive in time and cost than Primary data.
• Usually they are published or unpublished materials, records, reports,
e t c.

Descriptive Statistics
• Techniques used to organize and summarize a
set of data in a concise way.
–Organization of data
–Summarization of data
–Presentation of data
• Numbers that have not been summarized
and organized are called raw data.

Descriptive cont...
• Statistics is used to organize and interpret
research observations and findings
• Before interpretation & communication of the

findings, the raw data must be organized and
presented in a clear and understandable way

Descriptive cont….
Ordered array: A simple arrangement of individual
observations in order of magnitude.
Frequency distribution: A table which involves a listing
of all observed values of the variable being studied and
how many times each value is observed.
a) Qualitative variable: Count the number of cases in each
category.
b) Quantitative variable: Select a set of continuous, non-
overlapping intervals such that each value in the set of
observations can be placed in one, and only one of the
intervals
Descriptive cont…
Frequency distribution:
• The actual summarization and organization of
data starts from frequency distribution.
• The distribution condenses the raw data into a
more useful form and allows for a quick visual
interpretation of the data.

Frequency distributions for
categorical variables
• Summarizing categorical variables (nominal &
ordinal) is simple
• Count the number of observations (frequency)

in each category and present as relative
frequencies (percentages)
• Often presented in the form of table, bar and

pie charts
Frequency , categorical cont...
• A relative frequency distribution: shows the
proportion of counts that fall into each class or
category
• A relative frequency value for any category is
obtained by dividing the number of
observations in that category by the total
number of observations.
• This can be reported as a percentage by
multiplying the resulting fraction by 100.
Cumulative frequency distribution
 Cumulative frequencies: When frequencies of
two or more classes are added.
 Cumulative relative frequency: The percentage

of the total number of observations that have a
value either in that interval or below it.
 Mid-point: The value of the interval which lies

midway between the lower and the upper limits of
a class.

Cumulative frequency cont…
True limits(class boundaries): Are those limits
that make an interval of a continuous variable
continuous in both directions
Used for smoothening of the class intervals
Subtract 0.5 from the lower and add it to the

upper limit

Frequency distributions
• Data contain information and that summarization is a way
of making it easier to determine the nature of the
information.
• Relative frequency distributions: is most often used in
scientific publications to describe quantitative data sets.
They are better suited to the description of large data sets
and they permit a greater flexibility in the choice of class
widths.
-A frequency distribution is a table that organizes data
into classes.
-non overlapping classes, i.e. classes without common
items.

Guidelines for constructing tables
• Keep them simple
• Limit the number of variables to three or less
• All tables should be self-explanatory
• Include clear title telling what, when and where
• Clearly label the rows and columns
• State clearly the unit of measurement used
• Explain codes and abbreviations in the foot-note
• Show totals
• If data is not original, indicate the source in foot-note
• Example 1 The classification of students of a group by
the score on the subject “Statistical analysis” is presented
in Table 2.0a. The table of frequencies for the data set
generated by computer using the software SPSS is shown
in Figure 2.1.
•

Frequency percent Valid percent Cumulative
percent
Bad 6 13.3 13.3 13.3
Excellent 18 40.0 40.0 53.3
Good 15 33.4 33.4 88.7
Medium 6 13.3 13.3 100
Total 45 100 100

Steps to follow to construct a grouped frequency
distribution.
1. Make sure that you have a quantitative data
2. Find the range of the data
3. Determine the number of classes that you wish to have or
use sturge’s rule
4. Determine the width of the class
5. Determine the first lower class limit of the first class and
all the subsequent lower class limits
6. Write all the upper class limits of the classes
7. Finally, for each class, count the number of observation
and construct the freq. distribution, accordingly
• Example 3.6 Construct frequency table for the data set of
the above example on Age of 189 subjects.
K=1+3.322log(n) ~9 (Use 6 for the simplicity)
W=R/k ~5.788 (Use 10 for simplicity)
• where
• K = number of class intervals n = number of observations
• W = width of the class interval
R = Range where R= L-S
Where, L = the largest value and S= the smallest value in
certain observation.

Remarks:
• All classes of frequency table must be mutually exclusive.
• Classes may be open-ended when either the lower or the

upper end of a quantitative classification scheme is
limitless.
For example Class: age

– birth to 7 8 to 15 ........64 to 71 72 and older
– Classification schemes can be either discrete or continuous.

Diagrammatic Representation
It is Pictorial or graphic

presentations of numerical data

Graphical description of quantitative data:
Histogram and Polygon:
 There is an old saying that “one picture is worth a
thousand words”.
Indeed, statisticians have employed graphical techniques
to describe sets of data more vividly.
Bar charts and pie charts were presented before to
describe qualitative data.
With quantitative data summarized into frequency,
relative frequency tables , however, histograms and
polygons are used to describe the data.

Importance of diagrammatic
representation
 Much attractive than mere figures
 Required information can be obtained in
Less time without mental strain.
 Facilitates comparison
 Pattern of change in data can be detected
easily
 Stays in memory for more time
 Used to understand patterns and trends
Limitations of diagrams
 Can not be used as substitute for data
 Not an alternative to tabulation
 No accuracy ensured , gives only approximate
idea
 When graphs are poorly designed, they not
only do not effectively convey your message,
they often mislead and confuse.

Diagrammatic……
Specific types of graphs include:
• Bar graph
Nominal, ordinal data
• Pie chart
• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
• Scatter plot data
• Line graph
• Others

Graphical description of qualitative data
• Bar graphs and pie charts are two of the most widely used
graphical methods for describing qualitative data sets.
• Bar graphs give the frequency (or relative frequency) of
each category
• Example 1.3a (Bar Graph)
45
40
35
30
25
20
15
10
5
0
• Bad
Figure 1.3 Bar graph Excellent
showing the Good
number of students Medium
of each category

Two-way table (Cross tabulation):
• This table shows two characteristics and is formed when either of the two
variables (the caption or the stub) is divided into two or more parts.
• For instance , the marital status and cervical cancer status can be presented
in the following two way table.
Marital status Cervical Cancer status

Positive Negative
Single 49 47
Married 216 108
Widowed 87 86
Div/sep 15 45

Graphical (Diagrammatic) Presentation of Data.
• I. Bar Graph
• The bar graph is very commonly used and is better for representation of
qualitative data. Bars are vertical lines, where the lengths of the bars are
proportional to their corresponding numerical values and the bars should be
equally space.
• Example: if following data indicates the number clinical Nurses in given

woreda, it can be presented using different diagrams.
Degree Diploma Certefficate
Private 45 66 21
Gov't 48 46 12
NGO 12 24 4

70
60
50
40
30
20
10
0
Private Gov't NGO
degree Diploma Certefficate
Graph2.1 the bar Graph presentation for the number

clinical nurses in given woreda

Multiple bar graph
Sub-divided bar graph

III. Pie diagram (Pie chart)
• Pie chart enables us to show the partitioning of a total in to its component parts.
• The diagram is in the form of circle and component as slices of the circle.
• The size of the slice represents the proportion of the component out of the total.
• The angle of a component (x) is calculated as:
 value of component X  0
Degree of X=   ×360
 total value of the components 
Example: The following data indicates the marital status of 40 women who came for the
service of contraceptives to St. Paul HMMC. Present the data using Pie- diagram.
Marital status Married widowed separated single

Frequency 8 12 16 4

• Degree of the slice for married is calculated as:
 number of married women 
deg ree of Married women     3600
 total women 
 8 
deg ree of Married women =   ×3600  720
 40 
Like with the slice degree of the pie chart of the women for widowed, separated and
single women becomes is 108, 144and 36, respectively.
Frequency
Single
10% Married
20%
Separated
40% Widowed
30%
Graph 2.3: The Pie- diagram presentation of 40 women who came for for
contraceptive service to St. Paul HMMC.
Pie charts
Divide a complete circle (a pie) into slices, each
corresponding to a category, with the central angle and
hence the area of the slice proportional to the category
relative frequency.
Example 1.4b (Pie Chart)
Figure 1.3 Pie chart showing the number of students of each

category
Graphical description of quantitative data:
• Stem and Leaf displays
• Widely used in exploratory data analysis when the data set
is small.
• In order to explain what is a stem and what is a leaf we
consider the data from the table 1.4.1 (A foundation for
analysis in the health sciences. Biostatistics, Daniel)
• Steps to follow in constructing a Stem and Leaf Display
– Divide each observation in the data set into two parts,
the Stem and the Leaf.
– List the stems in order in a column, starting with the
smallest stem and ending with the largest.
– Proceed through the data set, placing the leaf for each
observation in theBiostatistics
appropriate stem
-Notes WA , SPH AAU ,2016 row.
Example 1.5
Table 1.4.1 contains a list of the ages of subjects who

participated in the study on smoking cessation discussed
in Example 1.4.1. As can be seen, this unordered table
requires considerable searching for us to ascertain such
elementary information as the age of the youngest and
oldest subjects.

The stem and leaf display of Figure 2.3.8 partitions the data
set into 11 classes corresponding to 11 stems. Thus, here
two-lines stems are used. The number of leaves in each
class gives the class frequency.
Advantages of a stem and leaf display over a frequency
distribution (considered in the next section):
1. the original data are preserved.
2. a stem and leaf display arranges the data in an orderly
fashion and makes it easy to determine certain numerical
characteristics to be discussed in the coming topics.
3. the classes and numbers falling in them are quickly
determined once we have selected the digits that we want
to use for the stems and leaves.
Histogram
• When plotting histograms, the phenomenon of interest is
plotted along the horizontal axis, while the vertical axis
represents the number, proportion or percentage of
observations per class interval – depending on whether or
not the particular histogram is respectively, a frequency
histogram, a relative frequency histogram or a percentage
histogram.
• Histograms are essentially vertical bar charts in which the

rectangular bars are constructed at midpoints of classes.
• Example 3.7 Below we present the frequency histogram
for the data set considered above, for which the
frequency table is constructed in Table 2.3.2.

• Remark: When comparing two or more sets of data, the
various histograms can not be constructed on the same
graph because superimposing the vertical bars of one on
another would cause difficulty in interpretation.
• For such cases it is necessary to construct relative

frequency or percentage polygons.

Polygons
• As with histograms, when plotting polygons the
phenomenon of interest is plotted along the horizontal
axis while the vertical axis represents the number,
proportion or percentage of observations per class interval
• depending on whether or not the particular polygon is
respectively, a frequency polygon, a relative frequency
polygon or a percentage polygon. For example, the
frequency polygon is a line graph connecting the
midpoints of each class interval in a data set, plotted at a
height corresponding to the frequency of the class.

• Example 3.8 Figure 2.3.4 is a frequency
polygon constructed from data in Table 2.3.2.

Cumulative distributions and cumulative polygons
• Other useful methods of presentation which facilitate
data analysis and interpretation are the construction of
cumulative distribution tables and the plotting of
cumulative polygons.
• A cumulative frequency distribution enables us to see
how many observations lie above or below certain
values, rather than merely recording the number of items
within intervals.

Ogive curve
• We may, for example, be interested in knowing
the number of patients whose weight is less than
50 Kg or more than say 60 Kg.
• To get this information it is necessary to change the form
of the frequency distribution from a ‘simple’ to a
‘cumulative’ distribution.
• Ogive curve turns a cumulative frequency distribution in
to graphs.

• Example: Heart rate of patients admitted in
• hospital Y, 2013.

Box and Whisker plot
• It is another way to display information when the
objective is to illustrate certain locations in the
distribution.
• A box is drawn with the top of the box at the third
quartile and the bottom at the first quartile.
• The location of the mid‐point of the distribution is
indicated with a horizontal line in the box.
• Finally, straight lines, or whiskers, are drawn from the
centre of the top of the box to the largest observation and
from the centre of the bottom of the box to the smallest
observation.

A box and Whisker diagram
A b and Whisker diagram

Scatter plot
• Most studies in medicine involve measuring more than
one characteristic, and graphs displaying the relationship
between two characteristics are common in the literature.
• When both the variables are qualitative then we can use a
multiple bar graph.
• When one of the characteristics is qualitative and the other
is quantitative, the data can be displayed in box and
whisker plots.
• To illustrate the relationship between two characteristics
when both are quantitative variables we use bivariate
plots (also called scatter plots or scatter diagrams).

Scatter plot

Line graph
 Useful for assessing the trend of particular situation overtime.
 Helps for monitoring the trend of epidemics.
 The time, in weeks, months or years, is marked along the
horizontal axis
 Values of the quantity being studied is marked on the vertical
axis.
 Values for each category are connected by continuous line.
 Sometimes two or more graphs are drawn on the same graph
taking the same scale so that the plotted graphs are
comparable.
Example: Infant and under five mortality rate in Ethiopia, 1970-2005 (Tefera Darge
2011; EDHS, 2000, 2005)
1970-75 1975-80 1980-85 1985-90 1990-95 1995-2000 2000-05

IMR 239 219,4 199,5 190 165 141 123
U5MR 160 138,8 127 104,8 95 83 77
160 IMR U5MR

138.8
127
104.8
95
239
219.4 83
199.5 190 77
165
141
123
1970-75 1975-80 1980-85 1985-90 1990-95 1995-2000 2000-2005
Graph 2.4 Infant and under five mortality rate in Ethiopia, 1970-2005
(Tefera Darge 2011; EDHS, 2000, 2005)

No. of microscopically confirmed malaria cases by species and
month at Zeway malaria control unit, 2003
2100
No. of confirmed malaria cases
1800 Positive
1500 P. falciparum
P. vivax
1200
900
600
300
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
Line graph cont..
 Line graph can be also used to depict the
relationship between two continuous
variables like that of scatter diagram .
The following graph shows level of zidovudine

(AZT) in the blood of AIDS patients at several
times after administration of the drug, for with
normal fat absorption and with fat mal
absorption.
Line graph cont…..
Response to administration of zidovudine in two groups of AIDS
patients in hospital X, 1999
8
7
Blood zidovudine
concentration
6
5
4
3
2
1
0
10
20
70
80
100
120
170
190
250
300
360
Tim e since adm inistration (Min.)
Fat malabsorption Normal fat absorption

Descriptive summary statistics
• Introduction
• In the previous section data were collected and
appropriately summarized into tables and charts.
• Next a variety of descriptive summary measures will be
developed.
• These descriptive measures are useful for analyzing and
interpreting quantitative data, whether collected in raw
form (ungrouped data) or summarized into frequency
distributions (grouped data)

Types of numerical descriptive measures
Four types of characteristics which describe a
data set pertaining to some numerical variable or
phenomenon of interest are:
1. Location
2. Dispersion
3. Relative standing
4. Shape

numerical descriptive measures
• In any analysis and/or interpretation of numerical data, a
variety of descriptive measures representing the properties
of location, variation, relative standing and shape may be
used to extract and summarize the salient features of the
data set.
• If these descriptive measures are computed from a sample
of data they are called statistics . In contrast, if these
descriptive measures are computed from an entire
population of data, they are called parameters.
• Since statisticians usually take samples rather than use
entire populations, our primary emphasis deals with
statistics rather than parameters.

Measures of central tendency (MCT)
• On the scale of values of a variable there is a
certain stage at which the largest number of items
tend to cluster.
• Since this stage is usually in the centre of
distribution, the , tendency of the statistical data to
get concentrated at certain values is called “central
tendency”
• The various methods of determining the actual
value at which the data tends to concentrate are
called measures of central tendency.

Measures of central tendency (MCT)
• The most important objective of calculating measure of
central tendency is to determine a single figure which may
be used to represent a whole series involving magnitude of
the same variable.
• In that sense it is an even more compact description of the
statistical data than the frequency distribution.
• Since a measure of central tendency represents the entire
data, it facilitates comparison with in one group or between
groups of data.

Position

Characteristics of a good measure of central tendency
A measure of central tendency is good or satisfactory if it
possesses the following characteristics.
1. It should be based on all the observations
2. It should not be affected by the extreme values
3. It should be as close to the maximum number of values
as possible
4. It should have a definite value
5. It should not be subjected to complicated and boring
calculations
6. It should be capable of further algebraic treatment
7. It should be stable with regard to sampling

Measures of location ( or central tendency)
A. Arithmetic Mean
A) Ungrouped data
Sample mean n
x  x2  ...  xn x i
x 1  i 1
• n n
Population mean
N
 x Sum of the values of all observations in population

i
 i 1

N Total number of observations in population

Arithmetic Mean
b) Grouped data
• In calculating the mean from grouped data, we assume
that all values falling into a particular class interval are
located at the mid-point of the interval. It is calculated as
follow:
where,
k = the number of class intervals
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
Example 1: if the mark of five medical students is: 80, 75, 60, 50, 90 the mean
mark of the students is calculated as:
n 5
X i XX1  X 2  X 3 .  ...  X 5
i
X i 1
 i 1

n 5 5
80  75  60  50  90
X  71.
5
• Therefore, the mean mark of the students was 71.
Exercise: You measure the body lengths (in inches) of 10 full-term infants at
birth and record the following: 17.5, 19.5, 17.5, 19, 20, 21, 18,
19.5, 18, 10.75. Compute the mean length of the infants for these
data.

Example 2: If the age patients diagnosed in a given day is given below.
Compute the mean age of the patients diagnosed per day.
Age of patients (year) 54 64 74 84 94 104

Number of patients 2 3 5 10 3 2
The mean agenof the patients can be calculated as:

X f i i
x1f1  x 2 f 2  ...  x n f n
X i 1

n
f1  f 2  ...  f n
f
i 1
i
54  2  64  3  74  5  84  10  94  3  104  2
X  68.72
2  3  5  10  3  2
Hence, the mean age of the patients is that were diagnosed

in that day was 68.72 year.

The arithmetic mean properties.
• For given set of data there is one and only one arithmetic
mean.
• The arithmetic mean is easily understood and easy to
compute.
• Algebraic sum of the deviations of the given values from
their arithmetic mean is always zero.
• The arithmetic mean possesses all the characteristics of a
central value, except No.2, which is greatly affected by
the extreme values.
• In case of grouped data if any class interval is open,
arithmetic mean can not be calculated

Mean Sensitive to Outliers
•
6
5
4
3 Mean = 12.0
2
1
0
0 5 10 15 20 25 30 35 40 45 50
Nights of stay
Mean = 15.3

Advantage and disadvantage of Mean
Advantage Disadvantage
 Mathematical center of a  It is affected by extreme
distribution. values and skewed
 Just as far from scores above it distributions that are not
as it is from scores below it. representative of the rest of the
data.
 Good for interval and ratio
data.  May not exist in the data.
 includes all the values of the
data set and unique .
 Inferential statistics is based on
mathematical properties of the
mean Biostatistics -Notes WA , SPH AAU ,2016
Measures of location ( or central tendency)
• Example Consider 189 subjects: 48,35,…66.
By definition , the mean is calculated as:
= (48+35+…+66)/189 = 55.032

Median
Definition 3.2
• The median m of a sample of n observations arranged in
ascending or descending order is the middle number that
divides the data set into two equal half: one half of the
items lie above this point, and the other half lie below it.
a) Median (~x ) a), Ungrouped data
 xk if n  2k  1 ( n is odd)
~ 
X  Median   1
 xk  xk 1  if n  2k ( n is even)
2
 n  1  th
  largest value, when n (size of the data) is odd
 2 
median(X)  
 1   n 
th
 n  2 
th


 2   2   2   value, when n is even
  
Example : to find the median of: 6,2,7,13,4,9,15,1,12.
Arrange the data in increasing order: 1, 2, 4, 6, 7, 9, 12, 13, 15.
 n 1
th
The sample size, n=9 (odd). So the median is the value,  . value
 largest
 2 
 9 1
th
 The median of the data becomes   larg est value  the 5 value ;
th
which is 7.  2 
Exercise: Compute the sample median for the birth weight data Solution:
3265, 3314, 2581, 2759, 2834, 2838, 2841, 3031, 3200, 3245, 3260, 3323,
4146, 3609, 3484,, 3101, 3248, 2069, 3649, 3541.

Median
• In calculating the median from grouped data, we assume
that the values within a class‐interval are evenly
distributed through the interval.
• The first step is to locate the class interval in which it is

located. We use the following procedure.
• Find n/2 and see a class interval with a minimum

cumulative frequency which contains n/2.
• To find a unique median value, use the following

interpolation formal.
Median
• where,
Lm = lower true class boundary of the interval containing the
median
Fc = cumulative frequency of the interval just above the
median class interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations

Properties of the median
• There is only one median for a given set of data
• The median is easy to calculate
• Extreme values in data set do not affect the median as
strongly as they do the mean.
• Median can be calculated even in the case of open end
intervals
• It is not a good representative of data if the number of
items is small

Advantage and Disadvantage of Median
 Not influenced by extreme scores  May not exist in the data.

or skewed distributions.  Doesn’t take actual value into
 Good for ordinary data account
 Easier to compute than the mean.

• Example 1 Find the median of the data set consisting of the observations
7, 4, 3, 5, 6, 8, 10.
Solution First, we arrange the data set in ascending order
3 4 5 6 7 8 10.
Since the number of observations is odd, n = 2 x 4 - 1, then median m = x4
= 6. We see that a half of the observations, namely, 3, 4, 5 lie below the
value 6 and an another half of the observations, namely, 7, 8 and 10 lie
above the value 6.
Example 2 Suppose we have an even number of the observations 7, 4, 3,
5, 6, 8, 10, 1. Find the median of this data set.
Solution First, we arrange the data set in ascending order
1 3 4 5 6 7 8 10.
Since the number of the observations n = 2 x 4, then by Definition
Median = (x4+x5)/2 = (5+6)/2 = 5.5

Mode
a) Ungrouped data
• The mode of a data set is the value of that occurs with
the greatest frequency, i.e., is repeated most often in the
data set.
• If all the values are different there is no mode, on the
other hand, a set of values may have more than one
mode.
b) Grouped data
• In designating the mode of grouped data, we usually refer
to the modal class, where the modal class is the class
interval with the highest frequency.
• Mode= lm+( A /A1+A2)W, where A frequency of mode class A1 difference of
frequency immediately above modal class, A2 d/f b/n model class frequency and the frequency
below the model class , W widthBiostatistics
of the class-Notes WA , SPH AAU ,2016
interval
Properties of mode
• The mode can be used as a summary measure for
nominal, ordinal, discrete and continuous data, in
general however, it is more appropriate for nominal
and ordinal data.
• It is not affected by extreme values
• It can be calculated for distributions with open end

classes
• Often its value is not unique

• The main drawback of mode is that often it does not
exist

Advantage and Disadvantage of Mod
 The mode is not used as often
 Good for nominal data. to measure central tendency
 Like the median, the mode as are the mean and the
is not unduly affected by median.
extreme values.  Too often, there is no modal
 We can use the mode no value because the data set
matter how large, how contains no values that occur
small, or how spread out the more than once.
values in the data set happen  Ignores most of the
to be. information in a distribution.
 Easiest to compute and  When data sets contain two,
understand three, or many modes, they
are difficult to interpret and
compare.
Example Find the mode of the data set in given below.

Geometric mean (GM)
• If x, xi ... xn, x are n positive observed values, then
and
• The geometric mean is generally used with data

measured on a logarithmic scale.

Harmonic mean (HM)
• Just as the geometric mean is based on an arithmetic
mean of logarithms, so is the harmonic mean based on
arithmetic mean of the reciprocals.
• We define it as the reciprocal of the arithmetic mean of
the reciprocal of the given numbers.
• If the given numbers are X1 X2... xn, , then

Weighted mean (WM)
• If the given numbers are X1 X2... xk, and have
known weights w1 w2 ... wk,

Comparing the Mean, Median and Mode
• In general, for a data set 3 measures of central tendency:
the mean , the median and the mode are different. For
example, for the data set on Age of 189 subjects, mean
=55.032, median = 54 and mode = 53.
• If all observations in a data set are arranged symmetrically
about a observation then this observation is the mean, the
median and the mode.
• Which of these three measures of central tendency is
better? The best measure of central tendency for a data set
depends on the type of descriptive information you want.

Percentiles and Quartiles
• The quartiles are sets of values which divide the distribution

into four parts such that there are an equal number of
observations in each part.
– Q1 = [(n+1)/4]th
– Q2 = [2(n+1)/4]th
– Q3 = [3(n+1)/4]th

Percentiles and Quartiles
• Percentiles divide the data into 100 parts of observations in
each part.
• It follows that the 25th percentile is the first quartile, the 50th
percentile is the median and the 75th percentile is the third
quartile.
 Percentile = p(n+1), p=the required percentile

Percentile Cont....
The pth percentile is a value that is  p% of the

observations and  the remaining (1-p)%.
The pth percentile is:
– The observation corresponding to p(n+1)th if
p(n+1) is an integer
– The average of (k)th and (k+1)th observations if
p(n+1) is not an integer, where k is the largest
integer less than p(n+1).
• If p(n+1) = 3.6, the average of 3th and 4th observations.
• P50 =50 th percentile=Q2 , P25= 25 th percentile=Q1
Example
Given a sample of size n = 60, find the 10th
percentile of the data set.
p(n+1) = 0.10(60+1) = 6.1
= Average of 6th and 7th
10% of the observations are less than or equal to this
value and 90% of them are greater than or equal to
the value

Exercise; Birth weight (gm) data for 20 infants
2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101,
3200, 3245, 3248, 3260, 3265, 3314, 3323, 3484,
3541, 3609, 3649, 4146
Question
1. Compute the Q3, 10th and 90th percentiles
15.75 =15 th +0.75(16-15)
3323+ 0.75(3484-3323)=
2 +0.1(3 -2 )
2581+ 0.1(2759-2581)
Answer 117117
10th percentile = 0.1(20+1) = 2.1 = Average of 2nd and 3rd value =
(2581+2759)/2 = 2670gm
90th percentile = 0.9(20+1) = 18.9 = Average of 18th and 19th value
= (3609+3649)/2 = 3629gm
We estimate that 80% fall between 2670-3629gm

Percentiles
• Simply divide the data into 100 pieces.
• Percentiles are not dependent on the distribution of the
data.

Using measures of central tendency
• Given a set of observations, an investigator may naturally
ask which measure of central tendency is best to use with
the data.
• Two factors are important in making this decisions:
1. The scale of measurement
2. The shape of the distribution of observations

1. The arithmetic mean is used for interval and ratio data
and for symmetric distribution.
2. The median and quartiles are used for ordinal, interval
and ratio data whose distribution is skewed.
3. For nominal data mode is the appropriate MCT.
4. The geometric mean is used primarily for observations
measured on a logarithmic scale.
5. Harmonic mean is a suitable MCT when the data
pertains to rates and time.
6. Weighted mean is commonly used in the construction of
index number.

Measures of variability
• The measure of central tendency alone is not
enough to have a clear idea about the distribution
of the data.
• Moreover, two or more sets may have the same
mean and/or median but they may be quite
different.
• Thus to have a clear picture of data, one needs to
have a measure of dispersion or variability
(scatterdness) amongst observations in the set.

Measures of variability
• Reporting only an average without accompanying measure
of variability may misrepresent a set of data.
• – Two datasets can have the same average but very
different variability.

Variation is important: Non statistician drowning in a river of average depth 0.3 meter.
Objectives of Measuring Variation
1. To judge the reliability of a measure of central tendency
2. To compare two or more sets of data with regard to their
variability
3. To control variability itself like in quality control, body

temperature, etc
4. To make further statistical analysis or to facilitate the use of

other statistical measures.
>
Range (R)
• R = xmax – xmin, where
XL is the largest value and XS is the smallest value.
Example: for the given data set: 100, 95, 125, 45, 70, the range is calculated
as:
R= xmax – xmin
R= 125 – 45
Range = 80.
Properties of Range
• Range and relative range are easy to calculate and simple to understand.
• Both cannot be computed for grouped data with open ended classes.
• They do not tell us anything about the distribution of values in the
series.
Exercise1: Find the range for the monthly salary of ten workers in a certain
health center given below. 462, 480, 534, 624, 498, 552,606, 588, 516,
570.
Interquartile range (IQR)
• IQR = Q3 ‐ Q1, where
Q3 is the third quartile and Q1 is the first quartile.
Example: Suppose the first and third quartile for weights of
girls 12 months of age are 8.8 Kg and 10.2 Kg respectively.
The interruptible range is therefore,
IQR = 10.2 Kg – 8.8 Kg
i.e., 50% of infant girls at 12 months weigh between
8.8 and 10.2 Kg.

Properties
• It is a simple and versatile measure
• It encloses the central 50% of the observations
• It is not based on all observations but only on two specific
values
• It is important in selecting cut‐off points in the formulation
of clinical standards
• Since it excludes the lowest and highest 25% values, it is
not affected by extreme values
• It is not capable of further algebraic treatment

Quartile deviation (QD)
 QD = (Q3- Q1)/2
Coefficient of quartile deviation (CQD)

CQD=(Q3- Q1)/(Q3+Q1)
 CQD is an absolute quantity (unitless) and is useful to
compare the variability among the middle 50%
observations

Variance and standard deviation
• A measure of dispersion relative to the scatter of the values
about their mean.
The population variance of the population of the
observations x is defined by the formula
• The variance is the average of the squares of the deviations
taken from the mean.
• The sum of squared deviations divided by the number of
deviations from the mean gives us the average sums of
squared deviations known as the variance

Sample Variance
• The sum of squared deviations divided by the number of
deviations from the mean gives us the variance
Why divide by n‐1

• Samples give us estimates of population parameters
(population mean and variance)
• Dividing by n underestimates the population variance and
this is easily demonstrated

Another feature about n‐1
• In many statistical tests we sum variances from groups and
we lose a data point or what is sometimes referred to as
degrees of freedom.
• As noted already in order to make estimates from samples
to a population certain conditions have to be met.
• An additional one being that the sum of the deviation
scores around the mean must add up to zero.
• For each sample estimate we therefore lose a degree of
freedom – all numbers on which the estimate is based are
free to vary except one.

Variance and standard deviation
(A), Ungrouped data
• • Let X1, X2, ..., XN be the measurement on N
• population units, then σ2 =
• The sample variance of the set x1, x2, ..., xn of n

observations is
(B), grouped data

Group data
• Where
mi = the mid‐point of the ith class interval
fi = the frequency of the ith class interval
= the sample mean
k = the number of class intervals

Example: If the blood sugar level of small population is: 80, 70, 95, 100, 125.
Calculate the variance and standard deviation of the data.
Solution: As the data is collected from the population, the variance is calculated
using:
 X  
2
 2

i
N
 But first theN mean is calculated as:
 
 i 1
Xi

80  70  95  100  125
N 5
470
  94
5
 To calculate the variance:
N
X  
2
80  94    70  94    95  94   100  94   125  94 

i 2 2 2 2 2
 
2 i 1

N 5
1770
=  354
5
 The standard deviation will be:
S.D    var iance  354  18.8

For grouped data with frequency, the population variance is
calculated as:
 f X  
2
2 
i i
N
The standard deviation is the square root of the variance.
i.e. S.D    Variance
• Example: In the study, the weight of six new born babies was recorded
below. Find the variance and S.D
Weight (K.G) 1.5 2.5 3

Frequency 2 3 1

Solution: Before calculating the variance the mean weight of the
babies will be:
• N
 Xi fi
1.5  2  2.5  3  3  1
 i 1
 2.25
N
2  3 1
• f
i 1
i
 fi  X i   
2
2(1.5  2.25)2  3(2.5  2.25)2  1(3  2.25)2
 
2

N 6
1.5  0.75  0.75
=  0.5
6
• Hence, the weight variability of the new born babies is 0.5

• And the standard deviation will be:
S.D  var iance  0.5  0.707

Example: If the blood sugar level of small population is: 80, 70, 95, 100, 125.
Calculate the variance and standard deviation of the data.
Solution: As the data is collected from the population, the variance is calculated
using:
 X  
2
 2

i
N
 But first theN mean is calculated as:
 
 i 1
Xi

80  70  95  100  125
N 5
470
  94
5
 To calculate the variance:
N
X  
2
80  94    70  94    95  94   100  94   125  94 

i 2 2 2 2 2
 
2 i 1

N 5
1770
=  354
5
 The standard deviation will be:
S.D    var iance  354  18.8

For grouped data with frequency, the population variance is
calculated as:
 i i
  
2
f X 
2
 
N
The standard deviation is the square root of the variance.

i.e. S.D    Variance
• Example: In the study, the weight of six new born babies was recorded
below. Find the variance and S.D
Weight (K.G) 1.5 2.5 3

Frequency 2 3 1

Solution: Before calculating the variance the mean weight of the
babies will be:
• N
 Xi fi
1.5  2  2.5  3  3  1
 i 1
 2.25
N
2  3 1
• f
i 1
i
 fi  X i   
2
2(1.5  2.25)2  3(2.5  2.25)2  1(3  2.25)2
 
2

N 6
1.5  0.75  0.75
=  0.5
6
• Hence, the weight variability of the new born babies is 0.5

• And the standard deviation will be:
S.D  var iance  0.5  0.707

Sample Variance ( S2)
For ungrouped data , sample variance is calculated using:
n
 (X i  X) 2
S2  i 1
n 1 Where X is the sample mean and n is the total

number of observations in the sample.
• Note: - for the sample data we divide by (n-1) instead of n as in the case of
population variance, as it gives better and unbiased estimator of the
population variance.
• Sample Standard Deviation ( S)

S.D  var iance
For grouped data the sample variance
n
is calculated as:
f i (X i  X) 2
S2  i 1
n
f
i 1
i -1
Example: If samples of 6 children were taken from the population with age
of: 17, 18, 19, 20, 22, 24. Calculate;
A) the variance B) the standard deviation
 First the sample mean is calculated as:
n
X i
17  18  19  20  22  24 120
X 11
   20
n 6 6
As the sample is considered, the variance can be formulated

as: 
n
( X i  X )2
(17  20)2  (18  20) 2  (19  20) 2  (20  20) 2  (22  20) 2  (24  20) 2
2
S 
i 1

n 1 6 1
9  4  1  0  4  16 34
=   6.8
5 5
The S.D can be calculated as

S.D  var iance  6.8  2.61

Exercise: calculate the variance and standard deviation for the following data.
1) 19, 20, 24, 12, 17, 22, 18, 20, 23, 17.
Age Frequency
2) 22 3
23 2
24 4
26 1
Q1 Q2
Mean 19.2 23.4
SD 3.489667 1.264911
Variance 12.17778 1.6

Properties
• The main demerit of variance is, that its unit is the
square of the unite of measurement of variate values
• The variance gives more weightage to the extreme
values as compared to those which are near to mean
value, because the difference is squared in variance.
• The drawbacks of variance are overcome by the standard
deviation.

Standard deviation (σ, S)
• It is the positive square root of the variance.

Properties
• Standard deviation is considered to be the best measure
of dispersion and is used widely because of the
properties of the theoretical normal curve.
• There is however one difficulty with it. If the units of

measurements of variables of two series is not the
same, then there variability can not be compared by
comparing the values of standard deviation.

Coefficient of variation (CV)
• In situations where either two series have different units of
measurements, or their means differ sufficiently in size, the
coefficient of variation should be used as a measure of
dispersion.
• It is the best measure to compare the variability of two
series of sets of observations.
• A series with less coefficient of variation is considered
more consistent.
• Coefficient of variation of a series of variate values is the
ratio of the standard deviation to the mean multiplied by
100.

• Example 3.6 Suppose that each day laboratory technician
A completes 40 analyses with a standard deviation of 5.
Technician B completes 160 analyses per day with a
standard deviation of 15. Which employee shows less
variability?
• At first glance, it appears that technician B has three times
more variation in the output rate than technician A. But B
completes analyses at a rate 4 times faster than A. Taking all
this information into account, we compute the coefficient of
variation for both technicians:

Example: In count of red blood cell (RBC) per ml of plasma concentration,
Abebe and Alemu get the following result. Which of the two lab technician
perform a reliable (consistent) measurement?
Laboratory technician Abebe Alemu

Mean count 79 64
Standard deviation 23 11
Solution: Alemu Abebe

S
S
CV  100 CV   100
x x
23
11   100  29.11%
 100  17.19% 79
64
• Interpretation: the measurement of Abebe has more variability (less

consistency) than Alemu’s measurment.

Characteristics of a distribution
• The measure of central tendency and variation discussed before do not
reveal the entire story about frequency distributions.
• Two distributions may have the same mean and standard deviation but they
may differ in their shape of the distribution.
• Further description of their characteristics is necessary that is provided by
Skewness.
• In a symmetrical distribution the values of mean, median and mode are

alike. The term ‘Skewness’ refers to lack of symmetry or departure from
the symmetry.
• If extremely low or extremely high observations are present in a

distribution, then the mean tends to shift towards those scores.

Skewness
• The skewness of a distribution is measured by comparing the
relative positions of the mean, median and mode.
 Distribution is symmetrical
Mean = Median = Mode
 Distribution skewed right
Median lies between mode and mean, and mode is less than mean
 Distribution skewed left
Median lies between mode and mean, and mode is greater than
mean

• Based on the type of skewness, distributions can be:
• a) Negatively skewed distribution: occurs when majority of scores are at
the right end of the curve and a few small scores are scattered at the left
end.
• b) Positively skewed distribution: Occurs when the majority of scores are

at the left end of the curve and a few extreme large scores are scattered at
the right end.
• c) Symmetrical distribution: It is neither positively nor negatively

skewed. A curve is symmetrical if one half of the curve is the mirror image
of the other half.

Introduction to Probability

Objective
• To provide understanding of probability and

their applications
• Calculation of probabilities using frequency

distribution
• Explain probability distribution and set the

ground for development of statistical inference

Introduction to sets
• A set is a collection of objects, sets are usually designated
by capital letters A, B,. . . etc
Example A= {a, b, c d} in the set “a” is a member of set

“A” and is denoted as a  A.
• Universal set (U); is a set of all objects under consideration (U),
• Empty/null set (); is a set that contains no members.
• Given two sets A and B; If being a member of A implies being a

member B, then A is a subset of B, denoted as A  B.

Introduction to sets
• Two sets A and B are equal: if A & B have the same members.
• If A  B= C  set C is A union B and contains elements

which are in A or in B or in both.
• If D = A  B  set D is A intersection B and consists of

elements which are in A and in B.
• Example A = {1, 2, 3, 4, 5} B= {a, b, 1, 2, 5, c, 6}
• A  B = {1, 2, 3, 4, 5, 6, a, b, c}
• A  B= {1, 2, 5}
Basic characteristics of Set
1. A = A, A = A, AU = U, AU= A
2. AA = A , A A = A;
3. AB = BA; A B=BA
4. (AB)C = A(BC); (AB) C=A(BC),
5. A(BC)=(AB)(AC); A
(BC)=(AB)U(AC)
6. (Ac)c = A
7. (AB) c = AcBc; (AB) c = AcBc

Probability
• Probability is the language of chance. The deliberate use of
chance is the central idea of statistical designs for producing data.
• Probability provide necessary tools to capture the

uncertain state of our knowledge.
• Probabilistic experiment to be any process that produces

outcomes which are not predictable in advance.

Probability
• Probabilities are used in everyday communication.
– A patient has a 50 – 50 chance of surviving a certain
operation
– The chance of a 30 year old woman to celebrate her 70th
birthday is 30%
• Because medicine is an inexact science, physicians seldom can
predict an outcome with absolute certainty.
• Example1
• To formulate a diagnosis, a physician must rely on available
diagnostic information about a patient;
– History and physical examination
– Laboratory studies,Biostatistics
X‐ray-Notes findings, ECG, etc
WA , SPH AAU ,2016
Probability
• Because no test result is absolutely accurate, it does affect
the probability of the presence (or absence) of a disease.
Example2
– We may hear a physician say that a patient has a 50—50 chance
of surviving a certain operation .
– Another physician may say that she is 95 percent certain that a

patient has a particular disease.

Probability
• understanding of probability is fundamental for
quantifying the uncertainty that is inherent in the decision
making process.
• Probability theory also allows us to draw conclusions

about a population of patients based on known information
about a sample of patients drawn from that population.

Basic terms
• A random experiment is an experiment for which the
outcome cannot be predicted with certainty, but all
possible outcomes can be identified prior to its
performance, and it may be repeated under the same
conditions.
• We call a phenomenon random if:-

– The exact outcome is not predictable in advance.
– however, there is a predictable long term pattern that can be

described by the distribution of outcomes of very many trials
Basic terms
• Sample space is the set of all possible outcomes of a
random experiment. It is denoted by S P(S) = 1
• In tossing a single six-sided die once the sample space is
S = {1, 2, 3, 4, 5, 6} .
• Equally likely: A set of events is equally likely if one of
them cannot be expected to happen in preference to
another.
– E.g. If A coin toss the outcome will be either heads
or tails.

Basic terms
• Mutually exclusive events: if the occurrence of one of
them preclude the occurrence of all others.
Two events A and B are mutually exclusive if they cannot
occur at the same time
P (A ∩ B) = 0
Example:
o A coin toss cannot produce heads and tails
simultaneously.
o Weight of an individual can’t be classified
simultaneously as “underweight”, “normal”,
“overweight”
Basic terms
• Independent Events: Two events A and B are
independent
 if the probability of the first one happening is the same no
matter how the second one turns out.
 The outcome of one event has no effect on the occurrence
or non-occurrence of the other.
Example:
 The outcomes on the first and second coin tosses are
independent

Basic terms
• Experiment = any process with an uncertain outcome
– When an experiment is performed, one and only one

outcome is obtained.
• Event = something that may happen or not when the

experiment is performed
– An event either occurs or it does not occur.

– Events are represented by uppercase letters such as A, B, & C

Examples
1. Experiment is blood test to determine HIV status. Possible
outcomes are {HIV +} and {HIV -}.
– A1 could be the event that a test comes out positive.
– A2 could be the event that a test comes out negative.
2. Experiment is blood test and further screening to determine

HIV status (HIV+ or HIV-) and AIDS status (D+ or D-).
Events are:
– {(HIV +;D+)}; {(HIV +;D-)}; {(HIV -;D+)}; {(HIV -;D-)}

3. Experiment is to record the number of people that get tested for
HIV in one week at a given clinic. Suppose 500 is the maximum
possible number of tests given in a week. Then any non-negative
integer less than or equal to 500 is a conceivable outcome.
Events are:{0}; {1}; {2}; … ; {500}
• Note that unions and intersections of events are events.
A1 is the event that greater than 100 people get tested.
A2 is the event that fewer than 220 people get tested.
A3 is the event that greater than 100 people but fewer than 220
get tested.
• The probability of an event A, denoted by P(A), in general, is

the chance A will happen. But how to measure the chance of
occurrence , i.e., how determine the probability an event?
4. Let a box containing 100 marbles, 90 of them red and
the other 10 blue.
 If the question is: ‘‘Are there red marbles in the box?’’,
someone who saw the box’s contents would answer
‘‘90%.’’
 But if the question is: ‘‘If I take one marble at random,
do you think I would have a red one?’’, the answer
would be ‘‘90% chance.’’
 The first 90% represents a proportion; the second 90%
indicates the probability.

Approaches to probability
1. Subjective Probability: Definitions of probability as a
quantitative measure of the “degree of certainty” of the
observer of experiment.
2. Classical definition: Definitions that reduce the concept

of probability to the more primitive notion of “equal
likelihood”
3. Statistical definition: Definitions that take as their point

of departure the “relative frequency” of occurrence of the
event in a large number of trials.
1. Subjective probability: measures the confidence or a wish
that a particular individual has in the truth of a particular
proposition.
– E.g. If some one says that he is 95% certain that a cure for
AIDS will be discovered within 5 years, then he means that
Pr(discovery of cure of AIDS within 5 years) = 95%.
• Although the subjective view of probability has enjoyed
increased attention over the years, it has not been fully
accepted by scientists.

2. The classical definition of probability:
 The probability P(A) of an event A is equal to the number
of possible simple events (outcomes) favorable to A
divided by the total number of possible simple events of
the experiment, i.e., where m= number of the simple
events into which the event A can be decomposed.
The probability of an event A can be: P(A)  m
N
Example 1. Consider the experiment of tossing a

balanced coin. P(H)=P(T)=1/2.

Example 2. Consider the experiment of tossing a
balanced . k=1, 2, 3, 4, 5, 6) are observed on the upper
face of the die. Therefore, P(Dk) =1/6 (k=1, 2, 3, 4, 5, 6).
 Let Dodd is the event that an odd number of dots are

observed,
 Deven an even number of dots are observed,
– we have P(Dodd)=3/6=1/2, P(Deven) = 3/6 = ½.
– Let A the event that a number less than 6 of dots is

observed then P(A) = 5/6
3. The statistical/Relative frequency probability:
The absolute frequency (A) of an event A in n trails is the
number of times A occurs, and the relative frequency of A in
these trials is: f ( A)
P(A) 
n
Example 1. Suppose that of 158 people who attended a

dinner party, 99 were ill due to food poisoning. The
probability of illness for a person selected at random is
Pr (illness) = 99/158 = 0.63 or 63%

Example 2. The record of a certain health center showed
that out of 10000 smokers, 2940 developed lung cancer.
If one smoker is randomly selected from these group,
what is the probability that he will develop lung cancer.
Let L:=the smoker develops lung cancer
P(L)=2940/10000=0.294
 Note : We will adopt the relative frequency interpretation

of probability, which says that the probability that an
event A occurs is equal to the proportion of the time that
A occurs if we repeat the random experiment again and
again to infinity:

Properties of probability
• The mathematical development of probability starts with
three basic rules or axioms:
1. The numerical value of a probability always lies between
0 and 1, inclusive. 0  P(E)  1
– A value 0 means the event can not occur
– A value 1 means the event definitely will occur
– A value of 0.5 means that the probability that the event
will occur is the same as the probability that it will not
occur.

2. The sum of the probabilities of all mutually
exclusive outcomes is equal to 1.
– P(E1) + P(E2 ) + .... + P(En ) = 1.
3. For any two events A and B P(A or B) is:

– P(A or B) = P(A) + P(B) - P(A and B) (Addition rule)
– For two mutually exclusive events A and B,
P(A or B ) = P(A) + P(B).

4. For any two independent events A and B:
P(A and B) =P(A) P(B) (Multiplication rule)
5. The complement of an event A, denoted by Ā or Ac, is

the event that A does not occur then P(Ac) = 1 ‐P(A)
(complementary events)

Basic Probability Rules
1. Addition rule
A. If events A and B are mutually exclusive:
 P(A or B) = P(A) + P(B)
 P(A and B) = 0
 If not mutually exclusive:
 P(A or B) = P(A) + P(B) - P(A and B)
 P(event A or event B occurs or they both occur)

Example: The probabilities below represent years of
schooling completed by mothers of newborn infants
1. What is the probability that a

mother has completed < 12
years of schooling?
2. What is the probability that a
mother has completed 12 or
more years of schooling?

Class work
The probability that at least three individuals
among the five develop hepatitis B is

 What is the probability that a mother has
completed < 12 years of schooling?
P( 8 years) = 0.056 and
P(9-11 years) = 0.159
 Since these two events are mutually exclusive,
P( 8 or 9-11) = P( 8 U 9-11)
= P( 8) + P(9-11) = 0.056+0.159
= 0.215
 What is the probability that a mother has completed 12 or
more years of schooling?
P(12) = P(12 or 13-15 or 16) = P(12 U 13-15 U 16)
= P(12)+P(13-15)+P(16)
= 0.321+0.218+0.230
= 0.769 Biostatistics -Notes WA , SPH AAU ,2016
B. If A and B are not mutually exclusive events,
then subtract the overlapping:
P(AU B) = P(A)+P(B) − P(A ∩ B)

2. Multiplication rule
 If A and B are independent events, then
P(A ∩ B) = P(A) × P(B)
More generally, if dependent

P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B)
P(A and B) denotes the probability that A and B both
occur at the same time.

Conditional Probability
 Refers to the probability of an event, given that another
event is known to have occurred.
 “What happened first is assumed”
 Hint - When thinking about conditional probabilities,

think in stages. Think of the two events A and B occurring
chronologically, one after the other, either in time or
space.
• Conditional probabilities, probabilities based on the
knowledge that some event has occurred.

• Conditional probabilities are denoted by P(B/A) or
P(Event/conditioning event).
• The formula for calculating a sample conditional
probability is :

The conditional probability that event B has occurred
given that event A has already occurred is denoted
P(B|A) and is defined
Provided that P(A) ≠ 0.

Example1.
Table1. A study investigating the effect of prolonged exposure to
bright light on retina damage in premature infants.
Retinopathy Retinopathy TOTAL

YES NO
Bright light 18 3 21
Reduced light 21 18 39
TOTAL 39 21 60

• Pr(D+/reduced light)= Pr(D+&Reduced
light)/Pr(reduced light)
=21/60/39/60=21/39=54%
• Pr(D+/bright light)=Pr(D+& bright light)
/Pr(bright light) =18/60/21/60=18/21=86%

• We want to know whether the probability of retinopathy
for the bright‐light infants differs form the probability of
retinopathy for the reduced‐light infants.
These probabilities are
• We want to compare the probability of retinopathy, given

that the infant was exposed to bright light, with that the
infant was exposed to reduced light.
• Exposure to bright light and exposure to reduced light are

conditioning events, events we want to take into account
when calculating conditional probabilities.
• For the retinopathy data, the conditional probability of
retinopathy, given exposure to bright light, is
• P(Retinopathy/exposure to bright light) is
= No. of infants with retinopathy exposed to bright light
No. of infants exposed to bright light
= 18/21 = 0.86
• P(Retinopathy/exposure to reduced light)
= No. of infants with retinopathy exposed to reduced light
No. of infants exposed to reduced light
= 21/39 = 0.54
• The conditional probabilities suggest that premature infants
exposed to bright light have a higher risk of retinopathy than
premature infants exposed to reduced light.
Class work
Table 2, shows the frequency of cocaine use by gender
among adult cocaine users.
_________________________________________________________________________________________________________________
Life time frequency Male Female Total

of cocaine use
__________________________________________________________________________________________________________________
1-19 times 32 7 39
20-99 times 18 20 38
more than 100 times 25 9 34
----------------------------------------------------------------------------------------------------
Total 75 36 111
----------------------------------------------------------------------------------------------------------------------
1. What is the probability of a person randomly picked is a male?
2. What is the probability of a person randomly picked uses cocaine more than 100
times?
3. Given that the selected person is male, what is the probability of a person
randomly picked uses cocaine more than 100 times?
4. Given that the person has used cocaine less than 100 times, what is the
probability of being female?
5. What is the probability of a person randomly picked is a male and uses cocaine
more than 100 times?
1. For independent events A and B,
P(A/B) = P(A).
2. For non independent events A and B
P(A and B) = P(A/B) P(B), (General Multiplication Rule)
3. Bays theorem:
P(A/B) = P(B/A) P(A)
P(B)

Home work
From a city population, the probability of selecting a male or a smoker
is 7/10, a male smoker is 2/5, and a male, if a smoker is already
selected is 2/3 . Find the probability of selecting (a) a non-smoker, (b)
a male, and (c) a smoker, if a male is first selected.
Let A: a male is selected
B: a smoker is selected. We are given
P(AB) =7/10 , P(AB) =2/5 , P(A|B) = 2/3
The probability of selecting a non-smoker is
P(Bc) = 1–P(B) = 1 - P(AB)/ P(A|B)
[P(A/B) = 1- P(AB)/ P(B) =
1 –(2/5)/(2/3)  P(B’) = 1 -3/5=2/5
The probability of selecting a male (by addition theorem) is:
P(A) = P(AB) + P(AB) – P(B)
= (7/10 )+(2/5)-(3/5)=1/2
Class work
Find the probability of selecting a smoker if a male is first selected is
P(B|A) ????
Home work
1. Consider the experiment of tossing a fair die and
define the following events:
A = {Observe an even number of dots}
B = { Observe a number of dots less or equal to 4}.
Are events A and B independent?
2. Suppose that three programmers are designing computer code for a
project: Mr. A has designed 60% of the code, Mr. B 30% and Mr. C
10%. Suppose further that Mr. A has a bug in 3% of her work, Mr. B
in 7% of her work, and Mr. C in 5% of his.
A. What percentage of the code written has a bug?
B. Given that you find a bug in a line of code, who is most likely to
have written it? Who is least likely?
C. How does the ordering compare to the unconditional probabilities
and why does this relationship make
Biostatistics -Notes WA , SPHsense?
AAU ,2016
Baye’s Theorem
• In the health sciences field a widely used application of probability
laws and concepts is found in the evaluation of screening tests and
diagnostic criteria.
• Of interest to clinicians is an enhanced ability to correctly predict
the presence or absence of a particular disease from a knowledge of
test results (positive or negative) and/ or the status of presenting
symptoms (present or absent).

Baye’s Theorem
• Also of interest is information regarding the likelihood of
positive and negative test results and the likelihood of the
presence or absence of a particular symptom in patients
with and without a particular disease.
• In our consideration of screening tests, we must be aware

of the fact that they are not always perfect. That is, a
testing procedure may yield a false positive or a false
negative.

Bayes Theorem
Total probability
If the event B may occur together with one and only one
of n mutually exclusive events A1, A2, ..., An then
P(B)= P(A1)P(B|A1)+P(A2)P(B|A2)+ ...+P(An)P(B|An).
Bayes’s Formula
If the event B may occur together with one and only one
of n mutually exclusive events A1, A2, ..., An then
P(Ak )P(B|Ak ) P(Ak )P(B|Ak )

P(Ak|B)   n
 P(A j )P(B|A j )
P(B)
j 1

Sensitivity and Specificity
• Data for assessing the sensitivity and specificity of a test are usually
of the form
Disease Category
Test result Diseased(+) Nondiseased (- total
)
+ A B A+B
- C D C+D
total A+C B+D 1.00
Sensitivity: is the proportion of diseased people who would
be correctly classified
estimated by Sens = A/(A + C).
Specificity: is the proportion of non diseased people who
would be correctly classified
estimated by Spec = D/(B
Biostatistics + D).
-Notes WA , SPH AAU ,2016
Sensitivity and Specificity
• The prevalence of a disease is the percent of the population
with the disease estimated by R = (A + C)/(A + B + C + D).
Note that a random sample is required to estimate prevalence.
• Positive Predictive Value: is the proportion of people who
tested positive that truly are positive.
estimated by PPV =A/(A + B).
• Negative Predictive Value: is the proportion of people who
tested negative that truly are negative.
estimated by NPV =D/(C + D).
• False Negative: The probability of a false negative is the
probability of testing negative given a truly positive condition.
• False Positive: The probability of a false positive is the
probability of testing positive given a truly negative condition.
Example1
Data for assessing the sensitivity and specificity of a test are usually of
the form
Disease Category
Test result Diseased(+) Nondiseased (-) total
+ 10000 5000 15000

- 1000 84000 85000
total 11000 89000 100000
 The estimated Sensitivity is Sens = A/(A + C)=90.9%
 The estimated Specificity is Spec = D/(B + D)=94.4%
 The estimated prevalence is R = (A + C)/(A + B + C + D)=11.00%.
 The estimated PPV is PPV =A/(A + B)=66.7%
 The estimated NPV is NPV =D/(C + D)=98.8%
PROBABILITY DISTRIBUTION

Probability distribution
• Every random variable has a corresponding probability
distribution.
• A probability distribution applies the theory of probability

to describe the behavior of the random variable.
• The term Probability distribution or just distribution refers

to the way data are distributed, in order to draw
conclusions about a set of data.

• Probability distribution is listing of all the possible values
that a random variable can take along with their
probabilities.
• A probability distribution of a random variable can be

displayed by a table or a graph or a mathematical formula.
• Random Variable is any quantity or characteristic that is

able to assume a number of different values such that any
particular outcome is determined by chance
• Random variables can be either discrete or continuous

• HHH HHT HTH THH
• TTT TTH THT HTT
• 0 1/8
• 1 3/8
• 2 3/8
• 3 1/8

• The random variable domain is the sample space and its
range is the set of real numbers.
Example1 Number of HIV+ patients up on taking a single
blood test to determine the status.
Example2 Observe 100 babies to be born in a clinic. The

number of boys, which have been born, is a random
variable. It may take values from 0 to 100.
Example3 Select one student from an university and

measure his/her height and record this height by x. Then x
is a random variable, assuming values from, say from 100
cm to 250 cm independence upon each specific student.
Basic definition
 A discrete random variable is able to assume only a finite or
countable number of outcomes
 A continuous random variable can take on any value in a specified
interval.
Example 1 Experiment is surgery on two people. Outcomes are {ss,sf,fs,ff}.
Example2 Experiment is to observe the number of people that get tested for
HIV in one week at a given clinic. Suppose 500 is the maximum
possible number of tests given in a week. Then any non-negative
integer less than or equal to 500 is a conceivable outcome.
X = number of tests in a given week.
Example3 Experiment is to record the number of places that a person has
lived in his or her lifetime. Possible outcomes are {1; 2; 3; …,}
X = number of places a person has lived.
Example4 . Experiment is to record the sex of a person. Outcomes {m, f}

Discrete Probability distributions
• For a discrete random variable X, a probability
distribution is a function that assigns to any possible value
x of X the probability P(X = x).
Two Requirements for a Probability Distribution:
1. The sum of the probabilities of all the events in the
sample space must equal 1; that is
ΣP(X)=1.
2. The probabilities of each event in the sample space must
be between or equal to 0 and 1. That is, 0≤P(X)≤1.

Example1:
• Consider again the experiment of taking a single blood
test to determine HIV status. Let the random variable X
denote the number of positive tests.
• Then X(HIV+)=1, X(HIV-)=0
If we knew that the prevalence of HIV was 0.11, then
P(X = 1) = 0.11 and P(X = 0) = 0.89
• These two equations completely describe the probability
distribution of the discrete (dichotomous) random
variable X.

Example 2 Consider the value on the face showing
up from tossing a die.
• The probability distribution of this variable is
Value on Face 1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6
• Notice that the total probability is 1.

• Example -3
The data shows the number of diagnostic services
a patient receives

• What is the probability that a patient receives exactly 3
diagnostic services?
P(X=3) = 0.031
• What is the probability that a patient receives at most one
diagnostic service?
P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900
• What is the probability that a patient receives at least four
diagnostic services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016
Expected Value of a Discrete Random variable
• The average value assumed by a random variable is called
its expected value, or the population mean
• It is represented by E(X) or µ=ΣX.P(X) the symbol E(X) is
used for the expected value.
Example expected value For the diagnostic service data:
Mean (X) = 0(0.671) +1(0.229) +2(0.053) +3(0.031) +4(0.010)
+5(0.006)
= 0.498 ≈ 0.5
• We would expect an average of 0.5 services for each visit

Variance of a Discrete Random Variable
• The variance of a random variable X is called the
population variance and is represented by Var (X) or σ2
σ2 = ∑(xi-µ)2P(X=xi)
Variance for above diagnostic service is
σ2 = ∑(xi-µ)2P(X=xi) = (0 − 0.5)2(0.671) +(1 − 0.5)2(0.229)
+(2 − 0.5)2(0.053) +(3 − 0.5)2(0.031)+(4 − 0.5)2(0.010)
+(5 − 0.5)2(0.006) = 0.782
Standard deviation = σ = √0.782 = 0.884

Factorials
• Given the positive integer n, the product of all the whole

numbers from n down through 1 is called n factorial and is
written n!.
• n! = nx(n‐1)x(n‐2)x…x2x1 = nx(n‐1)!
• By definition; 0!=1.

Factorials
• Permutation: An ordered arrangement of objects.
• Combinations: An arrangement of objects without

regard to order.

Binomial distribution
• It is one of the most widely encountered discrete
distributions.
• The origin of binomial distribution lies in Bernoulli’s trials.
• When a single trial of some experiment can result in only
one of two mutually exclusive outcomes (success or
failure; dead or alive; sick or well, male or female) the trail
is called Bernoulli trial.
Example1.
– Let X represents smoking status; X=1 smoker and X=0
non-smoker. The two outcomes are mutually exclusive.
– Take the case of USA; in 1987, 29% of the adults in USA
were smokers, therefore Pr (X=1) = 0.29 and Pr (X=0) =
1-0.29 = 0.71.

• Suppose an event can have only binary outcomes A and B.
Pr (X=success) = Pr (X=1) = p
• Pr (X=failure) = Pr (X=0) = 1-p
• If an experiment is repeated n times and the outcome is

independent from one trial to another, the probability
P(X=x) that outcome X occurs exactly x times is
Pr (X= x) = n! p x (1- p) n- x
x ! (n- x )!
where , n (trials) & p (each probability outcome of event X)
are parameters of the binomial distribution , x is number of
successes. and n! read as ”n factorial” or factorial n” is the
product of all integers 1 to n inclusive. By definition
1!=0!=1.

 Example 2
 Suppose now we randomly select two individuals in USA, see the
smoking status of the two persons,
 What is the probability
– That both are non smokers?
– one is a smoker?
– both are smokers?
 If Pr (X=1) = p and pr (X=0) = 1- p, then the above can be calculated
using the multiplicative rule.
_________________________________________________________________________________________________________________
Outcome of X
Person1 Person2 Prob No of smokers
_____________________________________________________________________________________________________________________
0 0 (1- p)(1- p)=0.71×0.71=0.50 0

0 1 (1- p) p=0.71×0.29=0.21 1
1 0 p (1- p)=0.29×0.71=0.21 1
1 1 p p=0.29 ×0.29=0.08 2
_______________________________________________________________
Characteristics of a Binomial Distribution
1. The experiment consist of n identical trials. There are
only two possible mutually exclusive outcomes, on each
trial.
2. The probability of A remains the same from trial to trial.
This probability is denoted by p, and the probability of B
is denoted by q. Note that q=1‐ p.
3. The trials are independent.
4. The binomial random variable X is the number of A’s in n
trials. n and p are the parameters of the binomial
distribution.
5. The mean is np and the variance is np(1‐ p)

 The general form of the Binomial pmf is given by:
• b(x; n, p) = nCx px qnx , (where q = 1  p), and its
cumulative density function
( cdf )is given by:
x x
F(x) = B(x; n, p) =  b(i; n, p) = 

i 0
n Ci  p i  q ni
i 0
It is paramount to observe that the binomial random variable ,

X, is the sum of n independent Bernoulli random variable, Xi,
i.e., X = X1 + X2 + ... + Xn
Where Xi represents the Bernoulli rv at the ith trial whose value is
equal to 0 or 1 (0 for failure and 1 for success) so that the Rx =
0, 1, 2, ..., n.

 Class work 1
1. Each child born to a particular set of parents has a probability
of 0.25 of having blood type O. If these parents have 5
children. What is the probability that
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d. 2 do not have blood type O.

Class work 2
2. Suppose you take a sample of N independent biologists
to determine how many of them use valid statistical
methods.
• In particular, you have a sample of N independent,
identically distributed RVs. With Yi with p=P(Y=1)
• What is the distribution of the number of successes
Y=∑NI=1 Yi in N trials? Y~Bin(y;N,p)
• Calculate the probability that 0 out of 10 biologists use valid
statistical methods when the probability of using valid statistical
methods is 0.8

The Poisson distribution
• Discrete probability distribution is used to model the
number of occurrences of an event that takes place
infrequently in time or space
• Applicable for counts of events over a given interval of
time, for example:
– number of patients arriving at an emergency
department in a day
– number of new cases of HIV diagnosed at a clinic in a
month
– Daily number of new cases of breast cancer notified
to a cancer registry
– Number of abnormal cells in a fixed area of
histological slides from a series of liver biopsies
• The theoretical situation giving rise to data of this type is
easier to describe in relation to events occurring over
time (or space) at a fixed rate on average, but where each
event occurs independently and at random.
• Such data will have a Poisson distribution
• Suppose events happen randomly and independently in
time at a constant rate. If events happen with rate l
events per unit time, the probability of x events
happening in unit time is:

• where x = 0, 1, 2, . . .x is a potential outcome of X
• t time of segment of interest
• The constant (lambda) represents the rate at which
the event occurs, or the expected number of events
per unit time
• e = 2.71828
• It depends up on just one parameter, which is the )

Three assumptions of Poisson distribution
1. The probability that a single event occurs within a
given small subinterval is proportional to the
length of the subinterval
2. The rate at which the event occurs is constant over
the entire interval t
3. Events occurring in consecutive subintervals are
independent of each other

Example
Example1
The daily number of new registrations of cancer is 2.2 on average.
• What is the probability of
a) Getting no new cases
b) Getting 1 case
c) Getting 2 cases
d) Getting 3 cases
e) Getting 4 cases
solution
• a) P(X=0)= 0 .111
• b) P(X=1) = 0.244
• c) P(X=2) = 0.268
• d) P(X=3) = 0.197
• e) P(X=4) = 0.108
• Characteristics;
• The Poisson distribution is very asymmetric when its mean
is small
• With large means it becomes nearly symmetric
• It has no theoretical maximum value, but the probabilities
tail off towards zero very quickly
• λ is the parameter of the Poisson distribution
• The mean is λ and the variance is also λ.

Probability distribution of continuous variables
• Under different circumstances, the outcome of a random
variable may not be limited to categories or counts.
Example 1
– Suppose, X represents the continuous variable
‘Height’; rarely is an individual exactly equal to 170cm
tall
– X can assume an infinite number of intermediate
values 170.1, 170.2, 170.3 etc.
• Because a continuous random variable X can take on an

uncountable infinite number of values, the probability
associated with any particular one value is almost equal to
zero.

Probability distribution of continuous variables
• However the probability that X will assume
some value in the interval enclosed by two
ranges say x1 and x2 is a value greater than
given by
• As a continuous variable can take an infinite

number of values, it helps to visualize the
probability distribution as a curve and
probabilities as ‘area under the curve’.
• It is also called normal distribution.

Normal Distribution
• The Normal Distribution is by far the most important
probability distribution in statistics.
• It is also sometimes known as the Gaussian distribution,
after the mathematician Gauss.
• The distributions of many medical measurements in
populations follow a normal distribution (eg. Serum uric
acid levels, cholesterol levels, blood pressure, height and
weight)
• The normal distribution is a theoretical, continuous
probability distribution whose equation is:
for -∝ < x < +∝

Normal Distribution
• The normal distribution for any given interval
between a and b is:

Characteristics of the Normal Distribution
1. It is a probability distribution of a continuous variable. It
extends from minus infinity( -∞) to plus infinity (+∞).
2. It is unimodal, bell-shaped and symmetrical about x = u.
3. It is determined by two parameters: referred as the mean μ

(read as ‘mu’) and standard deviation σ (read ‘sigma’).
– Changing μ alone shifts the entire normal curve to the left or
right.
– Changing σ alone changes the degree to which the distribution
is spread out.
– The mean μ can be any number (negative, positive or zero).
– The standard deviation σ must be a positive number.
Characteristics of the Normal Distribution
4. The height of the frequency curve, which is called the
probability density, cannot be taken as the probability of a
particular value.
– This is because for a continuous variable there are infinitely
many possible values so that the probability of any specific
value is zero.
5. An observation from a normal distribution can be related to a
standard normal distribution: (SND) which has a published
table.
– Thus an observation x from a normal distribution with
mean μ and standard deviation σ can be related to a
Standard normal distribution by calculating :
SND = Z = (x - μ ) / σ
6. Perpendiculars of the area under the curve.
– ± SD contain about 68%;

– ±2 SD contain about 95%;
– ±3 SD contain about 99.7%
7. The distribution is completely determined by

the parameters m and s.
Normal curve

Normal probability
• Normal curve area for Z value of 1.95 in the table

1 Biostatistics Lecture Notes Part One

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Biostatistics Lecture Notes Part One

Uploaded by

Copyright:

Available Formats

Introduction to Biostatistics

Wondimu Ayele(Msc, PhD fellow )

– Define variable and data

– Describe types of data and measurement scales

– Organize and display data

– Define and calculate measures of central tendency

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

• To make it easy to understand complex and large data

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

• To compare drug action –potency/efficacy

• Confirm association between two attributes: Cancer and

Biostatistics -Notes WA , SPH AAU ,2016

• Recording of vital events

• Leading causes of death/ morbidity in the community

• Demographic characteristics of a community.

• Health system research.

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

• Statistical data can be treated as approximations or

Biostatistics -Notes WA , SPH AAU ,2016

Descriptive Statistics Inferential statistics

Measure of Measure of Test Estimation

Non Parametric Parametric Point Interval

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

• There are four types of scales of measurement.

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

It has no true zero point.

Example: Temperature measured on Celsius or Fahrenheit

4. Ratio scale: measurement begins at a true zero point

Examples: Height, weight, blood pressure

Biostatistics -Notes WA , SPH AAU ,2016

Variables: A characteristic which takes different

Animals of the same species may differ in their Length,

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Nominal Ordinal Discrete Continuous

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Example: data gathered from interview, questionnaire, or field observation of the

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

• Before interpretation & communication of the

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

• Count the number of observations (frequency)

• Often presented in the form of table, bar and

 Cumulative relative frequency: The percentage

 Mid-point: The value of the interval which lies

Biostatistics -Notes WA , SPH AAU ,2016

Used for smoothening of the class intervals

Subtract 0.5 from the lower and add it to the

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Biostatistics -Notes WA , SPH AAU ,2016

Bad 6 13.3 13.3 13.3

Excellent 18 40.0 40.0 53.3