You are on page 1of 29





Introduction to Biostatistics

Learning Objectives

After completing this lesson, the student will be able to:

• Define Statistics and Biostatistics

• Enumerate the importance and limitations of statistics

• Define and Identify the different types of data

• Classify variables.

• Differentiate between descriptive and Inferential statistics

Introduction to Statistics

The field of statistics encompasses the study and application of theory and
methods for data analysis arising from random processes or phenomena.

Simple put: It is the study or application of techniques and methods for:

1. Collecting (gathering), organizing (classifying), summarizing , analyzing and

Interpreting data. (Descriptive Statistics)

2. It also involves the drawing of inferences and conclusions about a body of data
(population) when only a part of the data (sample) is observed. (Inferential
The term statistics could either be represented as statistical data or statistical

• Statistical data: This refers to numerical descriptions of things. These

descriptions may take the form of counts or measurements.

• E.g. statistics of malaria cases in Yamfo within a particular period in terms of age
and sex distribution.

• Note: Although statistical data are always represented in figures (numerical

descriptions), it must however be noted that, it is not all 'numerical descriptions' that
are said to be statistical data.
Characteristics of statistical Data.
For a numerical descriptions to be called statistics they must meet the following characteristics:

• Be in aggregates – This means that statistics are 'number of facts.' A single fact, even though
numerically stated, cannot be called statistics.

• Be affected to a large extent by a multiplicity of causes or borne out of a 'variety of


• E.G. The malaria outbreak is attributable to a number of factors, such as Human, parasite,
mosquito and environmental factors

• All these factors acting jointly determine the severity of the outbreak and it is very difficult to assess
the individual contribution of any one of these factors in Isolation.
• They must be enumerated or estimated according to a reasonable standard of
accuracy – Statistics must be enumerated or estimated according to reasonable
standards of accuracy.

 This means that if aggregates of numerical facts are to be called 'statistics' they
must be reasonably accurate. This is necessary because statistical data are to
serve as a basis for statistical investigations. If the basis happens to be incorrect
the results are bound to be misleading

• They must have been collected in a systematic manner for a predetermined purpose.
Numerical data can be called statistics only if they have been compiled in a properly
planned manner and for a purpose about which the enumerator had a definite idea.
Facts collected in an unsystematic manner and without complete

awareness of the object, will be confusing and cannot be made the basis

of valid conclusions.

• They must be placed in relation to each other. That is, they must be

comparable. Numerical facts may be placed in relation to each other

either in point of time, space or condition. The phrase, ‘placed in relation

to each other' suggests that the facts should be comparable.

statistical methods: When the term 'statistics' is used to mean statistical

methods' it refers to a body of methods that are used for collecting, organising,

analyzing and interpreting numerical data for understanding a phenomenon or

making wise decisions. In this sense it is a branch of scientific method and helps

us to know in a better way the object under study.

• The branch of modern statistics that is most relevant to public health

and clinical medicine is statistical inference.

This branch of statistics

• deals with techniques of making conclusions about the population.

• Inferential statistics builds upon descriptive statistics. The inferences are drawn

from particular properties of sample to particular properties of population.

• These are the types of statistics most commonly found in research

Significance for studying Statistics

• Statistics pervades a way of organizing information on a wider and more formal basis than relying

on the exchange of anecdotes and personal experience.

• Statistics is essential for researchers. That is, basic statistics is useful in conducting clinical studies

and field of surveys.

• Knowledge of statistics also aids researchers in effectively presenting their findings in reports,

journals and at professional meetings.

• There is a great deal of intrinsic (inherent) variation in most biological processes

• Public health and medicine are becoming increasingly quantitative. As technology progresses, the

physician encounters more and more quantitative rather than descriptive information.
 In one sense, statistics is the language of assembling and handling
quantitative material. Even if one’s concern is only with the results
of other people’s manipulation and assemblage of data, it is
important to achieve some understanding of this language in order
to interpret their results properly

• The planning, conducting, and interpretating medical research are

becoming increasingly reliant on statistical technology.
• The use of statistics helps to minimize biases on the part of researchers by
way of helping to decide whether the claims are valid or invalid.

 A person with appropriate knowledge of statistics is better able to decide

whether his or her colleague is using statistics to highlight or merely support
their personal biases

• The application of statistical methods allows uncertainties or variations in

medicine to be understood better.
Limitations of statistics:
• It deals with only those subjects of inquiry that are capable of being quantitatively measured and
numerically expressed.

• It deals on aggregates of facts and no importance is attached to individual items–suited only if their
group characteristics are desired to be studied.

• Statistical data are only approximated and not mathematically correct.

Forms of statistics: Roughly speaking, the field of statistics can be divided into:

• Mathematical Statistics: the study and development of statistical theory and methods in the abstract

• Applied Statistics: the application of statistical methods to solve real problems involving randomly
generated data and the development of new statistical methodology motivated by real problem

When the different statistical methods are applied in biological, medical and public

health data they constitute the discipline of Biostatistics.

• Biostatistics can therefore be defined as the application of the mathematical tools

used in statistics to the fields of biological sciences and medicine.

• Biostatistics is a growing field with applications in many areas of biology including

epidemiology, medical sciences, health sciences, educational research and

environmental sciences

• Biostatistics is also referred to as BIOMETRY (Francis Galton 1822-1911, Father

of Biostatistics).
Concerns of Biostatistics

• Biostatistics is concerned with collection, organization, summarization

and analysis of data (descriptive statistics)

• We seek to draw inferences about a body of data when only a part of

the data is observed (inferential statistics)

• Biostatistics is concerned with the interpretation of data and the

communication of information about the data.
 Some questions are normally asked when there is the need for statistical manipulations.

• Is this new drug or procedure better than the one commonly in use?

• How much better? What, if any, are the risks of side effects associated with its use?

• In testing a new drug how many patients must be treated, and in what manner, in order to

demonstrate its worth?

• What is the normal variation in some clinical measurement?

• How reliable and valid is the measurement?

• What is the magnitude and effect of laboratory and technical error?

• How does one interpret abnormal values?

• Data are numbers which can be measurements or can be obtained by counting.

Data is the raw material of statistics

•A variable is an object, characteristic or property that can have different values.

•Before an information is gathered, we normally observe various phenomena or

occurrences that draw our attention.
For example, information on various characteristics such as age, sex, weight,
occupation, marital status etc. These characteristics are referred to as VARIABLES.
The values of the observations recorded for the variables are what we refer to as

There are two types of data:

• Primary data: It is the data gathered by a particular person or organization for his or
her own use from the primary sources.

• Secondary data: It is the data gathered by other person(s) for use, but also used
by a different person or group of persons. This means that, a particular data can be
a primary data for one person and secondary for another person.


• Quantitative data/variables(numbers: weights, ages, …).

• Qualitative data/variables (words or attributes: nationalities, occupations, …).

• Variables that yield observations on which individuals can be grouped according
some features or quality are referred to as QUALITATIVE variables. In other words,
A qualitative variable is characterized by its inability to be measured but it can be
sorted into categories

• e.g., Sex, marital status, occupation, education level etc.

A. Nominal Qualitative Variables:

• A nominal variable classifies the observations into various mutually exclusive and
collectively non-ranked categories. The values of a nominal variable are names or
attributes that can not be ordered or sorted or ranked.
• Examples: - Blood type (O, AB, A, B)

• - Nationality (Saudi, Egyptian, British, …)

• - Sex (male, female)

B. Ordinal Qualitative Variables:

• An ordinal variable classifies the observations into various mutually exclusive and
collectively ranked categories. The values of an ordinal variable are categories that
can be ordered, sorted, or ranked by some criterion.

• Examples: - Educational level (elementary, intermediate, …)

• - Students grade (A, B, C, D, F)

• - Military rank
• Variables that yield observations that can be measured are considered
QUANTITATIVE variables. A quantitative variable can be measured in some way.

• e.g., height, weight, blood pressure, body temperature etc.(how much or


• Quantitative variables may further be classified as discrete or continuous


• Discrete variables are recorded in whole e.g. number of children in the family,
pulse rate, etc.
• There are jumps or gaps between the values.
• Examples: - Family size (x = 1, 2, 3, … )
• - Number of patients (x = 0, 1, 2, 3, … )
• Continuous variables may be measurable in fractional values. E.g. height, weight,
temperature etc.

• Continuous Variables:

• There are no gaps between the values.

• A continuous variable can have any value within a certain

• interval of values.

• Examples: - Height (140 < x < 190)

• - Blood sugar level (10 < x < 15)

Data are obtained from
• Analysis of records (Routinely kept records)
• Surveys
• Counting
• Experiments
• External sources (Reports, data banks….)
Examples health care data
• Deaths
• Births
• In-patient hospital activity - cause, age, sex, address, treatment, type of
• A&E – attendances, treatment, age, sex
• Ambulance – cause, seriousness, location
• Prevention - vaccination rates, quit smoking
• Morbidity – asthma & diabetes prevalence, communicable diseases
• Child Health Surveillance – breastfeeding, behaviour, immunisations,
• Prescribing – by GP practice
• Workforce – numbers, qualifications, skills
• Patient and Staff Surveys
• Population – GP Practice
•A population is the collection or set of all of the values that a variable may have.

 It is the totality of the individual observations about which inferences are to be


 A population may refer to variables of a concrete collection of objects (or creatures)

e.g. heights of all students studied in a class.

• Population Size (N):

• The number of elements in the population is called the population size and is
denoted by N.

•A sample is a part of a population.

• Samples may be representative (if drawn well)

•The compositions of samples may be different in size, quality and the

techniques used in drawing the sample.

• A representative sample will have its characteristics almost equal to the

parameters of the entire population.

• Sample Size (n):

• The number of elements in the sample is called the sample

Sampling and Statistical Inference
• There are several types of sampling techniques, some of which are:

• Simple random sampling: If a sample size(n) is selected from a population of

size (N) in such a way that each element in the population has the same chance to
be selected, the sample is called a simple random sample.

• Stratified Random Sampling: In this type of sampling, the elements of the

population are classified into several homogenous groups (strata). From each
group, an independent simple random sample is drawn. The sample resulting from
combining these samples is called a stratified random Sample.
• Statistical inference is the procedure used to reach a conclusion about a
population based on the information derived from a sample that has been drawn
from that population.

You might also like