You are on page 1of 68

Bi·o·sta·tis·tics

Learning Objectives
◼ To understand statistics
◼ To understand functions/roles/tasks of
statisticians
◼ Types of statistics
◼ Definition of biostatistics

To understand Biostatistics consider
two basic questions

1. What does the term statistics mean?


2. What do statistician do?
Statistics
◼ The word “Statistics” comes from Latin
word “Status” meaning a political state.
◼ Originally used by the states to gather
information about sizes of populations
and armed forces.
◼ This word has now acquired different

meaning
Statistics
Numbers measured for some purpose

◼ Statistics is a collection of procedures and


principles for gathering data and
analyzing information in order to help
people make decisions when faced with
uncertainty
One way of defining statistics
is…
The science of quantifying
uncertainty, dealing with
uncertainty, and making decisions in
the face of uncertainty…
Is a science and art of:
◼ Collecting
◼ Summarizing

◼ Analyzing

◼ Interpreting

◼ Presenting

of data.
What do statistician do?
◼ The statistician is primarily concerned with
developing and applying statistical methods

The statistician’s tasks are:


◼ To guide the design of an experiment or survey
◼ To analyze data
◼ To present and interpret results
Statistician’s tasks
◼ How should we collect data and how
much data is needed?
◼ How can we effectively summarize the

data?
◼ What decisions are possible based on the

observed data?
To call in the statistician after
the experiment is done may be
no more than asking him to
perform a post-mortem
examination: he may be able to
say what the experiment died of.
(Ronald Fisher (1938))
Statistics Types
1. Descriptive Statistics

2. Inferential Statistics
Descriptive statistics

Deals with the


◼ Enumeration

◼ Organization

◼ Graphical presentation

of data
Descriptive statistics
◼ Are used to describe the basic features of the
data in a study.
◼ They provide simple summaries about the
sample and the measures.
◼ Together with simple graphics analysis, they
form the basis of virtually every quantitative
analysis of data.
Types of stone Frequency Percent
Pure calcium oxalate
92 33.1

Calcium oxalate mixed


50 18.0

Uric acid pure


51 18.3

Uric acid mixed


63 22.7

Struvite
22 7.9

Total
278 100.0
100

90

80

70

60

50

92
40

30 63
50 51
20

10 22

0
pure calcium oxalate calcium oxalate mixed uric acid pure uric acid mixed struvite
◼ Concerned with reaching conclusions
from incomplete information
◼ We use information, obtained from a

sample to say something about the


entire population

Generalization
◼ When we have to
◼ Compare

◼ Test hypothesis

◼ Predict
Descriptive statistics Inferential statistic

Collecting
Collecting Summarizing
Summarizing Analyzing
presenting Interpreting
presenting
Statistics

Descriptive Statistics Inferential Statistics


◼ Gives numerical and ◼ Provides procedures to
graphic procedures to draw inferences about a
summarize a collection population from a
of data in a clear and sample
understandable way
◼ Biostatistics - biology and statistics
sometimes referred to as
Biometry or Biometrics

◼ The science of statistics applied to the analysis


of biological or medical data.
Numeric data on
◼ Births

◼ Deaths

◼ Diseases

◼ Injuries

◼ and other factors

affecting the general health and condition of


human populations. Also called
DATA
◼ The word data is the Latin plural of datum
“to give”
“Something given”
◼ Data are the sets of values of one or more
variables recorded on one or more
individuals.
DATA
Data are typically the results of
measurements and can be the
basis of graphs, images, or
observations of a set of variables.
VARIABLES

Variable is a characteristic of a
person, object or phenomenon
that can take different values
VARIABLES
Is any quantity or quality of a subject
which can be measured and which “varies”
i.e. likely to have different values from one
subject to another

age, sex, height, weight etc.


Data and Information
◼ Data are the raw materials of statistics, carry
little meaning when considered alone.
◼ Data need to be transformed into Information

by reducing and summarising them


Data and Information
◼ 110/80, 120/70, 100/60, 120/80, 120/100,
100/70, 110/90, 130/100, 150/ 100, 110/60
110/80, 120/70, 100/60, 120/80, 120/100,
100/70, 110/90, 130/100, 150/ 100, 110/60
110/80
110/80, 120/70, 100/60,mmHg
120/80, 120/100,
100/70, 110/90, 130/100, 150/ 100, 110/60
110/80, 120/70, 100/60, 120/80, 120/100,
100/70, 110/90, 130/100, 150/ 100, 110/60
110/80, 120/70, 100/60, 120/80, 120/100,
100/70,110/90, 130/100, 150/ 100, 110/60
TYPES OF DATA
Nominal
Qualitative
Ordinal

Discrete
Quantitative
Continuous
1. Qualitative data

Variables that yield observation, by


which individuals can be categorized
according to some characteristic or
quality
e.g. Sex, Occupation, Educational
status
Qualitative data is of two types
a. Nominal
Nominal data are the data that one can name;
they are not measured but simply counted
and consist of unordered type of observations
such as:
◼ Sex
◼ Hair color
◼ Treatment group
b. Ordinal
Where there is natural ordering of the
categories such as
◼ Degree of severity of disease
(mild, moderate and severe)
( Primary, secondary, graduate)
◼ Occupational groups
(skilled labor and unskilled labor)
2. Quantitative data
Observations that can be
measured are considered to be
quantitative data e.g:
◼ Weight

◼ Height

◼ Serum cholesterol level


Quantitative data is of two types
a. Discrete
Taking only integral (whole number)
values
◼ Number of children in a family
◼ Number of teeth
◼ Number of deaths
b. Continuous

Data recorded on a continuous scale such


as
◼ Height
◼ Weight
◼ Serum sodium level
◼ Hemoglobin level
There are different statistical
methods for different types of
data
Primary and secondary data
Primary data
PRIMARY data is data that you collect
yourself using such methods as:
◼ Direct observation

◼ Surveys

◼ Interviews
Secondary data
SECONDARY data is collected from external
sources such as:
◼ TV, radio, internet

◼ magazines, newspapers

◼ Reviews

◼ Research articles

◼ Stories told by people you know


Secondary data
◼ Is cheaper and easier to acquire than
primary data.
◼ The problem is that often the reliability,
accuracy and integrity of the data is
uncertain.
◼ Who collected it?
◼ Can they be trusted?
◼ Did they do any preprocessing of the data?
◼ Is it biased?
◼ How old is it?
◼ Where was it collected?
◼ Can the data be verified
Primary data
◼ Primary data can be relied on because
you know where it came from and what
was done to it.
◼ It's like cooking something yourself. You
know what went into it.
“Statistical thinking will one
day be as necessary for
efficient citizenship as the
ability to read and write”
Samuel S. Wilks
Thank you
◼ Data are obtained to answer a particular
research question.
◼ Data collection methods fall into the five
categories
1. Questionnaires
2. Interviewing
3. Observations
4. Surveys
5. experimentation
QUESTIONNAIRE DESIGN

Two Types of Questions:

1.Open format or Open-ended

2.Closed format or Closed-ended


Open format or Open-ended
questions
◼ Open-ended questions give your audience an opportunity to
express their opinions in a free-flowing manner.
◼ These questions don't have predetermined set of responses
and the respondent is free to answer whatever he/she feels
right.
◼ By including open format questions in your questionnaire,
you can get true, insightful and even unexpected
suggestions.
◼ Qualitative questions fall under this category.

What is route of transmission of HBV?


Closed format or Closed-ended
questions
◼ Multiple choice questions, where
respondents are restricted to choose among
any of the given answers
◼ There is no fixed limit as to how many
multiple choices should be given; the
number can be even or odd
How would you rate the teaching of CM Deptt Cost- effective health services are
Summarizing and Presenting Data

Cat on Mat
Summarizing and presenting
data
There are three general ways of
organizing and presenting data
◼ Tables

◼ Graphs and charts

◼ Measures of central tendency.


FREQUENCY DISTRIBUTIONS
◼ Frequency distribution is a
description of data presented in
tabular form so the data will be more
manageable.
◼ It gives the frequency with which
particular values appear in data.
Frequency table for systolic blood pressure
recorded on 63 individuals

Class Interval Frequency

90-109 10
110-129 24
130-149 18
150-169 09
170-189 02
MEASURES OF CENTRAL TENDENCY

◼ In a case of many biological characteristics, the


values of the extent of the observations are not
equal, but we notice a general tendency of such
observations to cluster around a particular level
◼ In this situation it may be preferable to characterize
each group of observation by such a level, which is
called the Central Tendency of that group
MEASURES OF CENTRAL TENDENCY

❑ The tendency of a set of data to center around


certain numerical values
❑ To find a value about which the observations tend
to cluster
❑ It is a value in the data sets or in the observations
around which the other values are distributed.
❑ Allow us to summarize an entire data set with a
single value (the midpoint).
Three commonly used measures of
central tendency are:

❑ MEAN
❑ MEDIAN
❑ MODE
MEAN
❑ Widely used in statistical calculations.
❑ Basic parameter of central tendency.

❑ To obtain the mean the individual observations are


first added together and then divided by the total
number of observations
130 ∑X
110 X= n
120
100
140 600
---------- ------------- = 120
600 5 Mean

X = 120
Mean is affected by the extreme
values
Set A: 24, 25, 29, 29, 30, 31

Mean= 28
Set B: 24, 25, 29, 29, 30, 131

Mean= 44.7
MEDIAN
❑ Median is the observation that divides the
distribution in to two equal parts.
OR
❑ It is the observation that occupies the middle
position when all the observations are arranged in
order of their magnitude.
❑ It is necessary to keep the data in ascending or
descending order
Un-arranged data Arranged data

71
83
75
75
75
81
77
79
79
71
81
95
83
75
84
77
95
84
Calculate the median
71
75
If the data has 75
an even 77
79 79+81= 160
number , add 81 160/2= 80
the 2 middle 83
numbers and 84
95
divide by 2. 100
MODE
❑ The mode is the most frequently occurring
observation or the commonly occurring
value in a distribution.
❑ It is not necessary to keep the data in
ascending or descending order.
❑ It is the only measure of central tendency that
can be used with Nominal data
Example
20, 10, 20, 22, 21, 20, 30, 35,
40, 45, 25

Mode is 20
123456789
No mode
Experimental question:
What? How?

Sample selection: Who? How many?

Collect Data

Analysis: Is there an effect?

Conclusion: To whom?
Biostatistics

You might also like