Professional Documents
Culture Documents
BIOSTATISTICS
BY
HARI RAJAN.G 1
Definition of Statistics
• Different authors have defined statistics differently. The best definition of
statistics is given by Croxton and Cowden according to whom statistics
may be defined as the science, which deals with
collection, presentation, analysis and
interpretation of numerical data.
• The science and art of dealing with variation in data through collection,
classification, and analysis in such a way as to obtain reliable results. —
(John M. Last, A Dictionary of Epidemiology )
• Branch of mathematics that deals with the collection, organization, and
analysis of numerical data and with such problems as experiment
design and decision making. —(Microsoft Encarta Premium
2009)
2
Definition of Biostatistics= Medical
statistics
• Biostatistics may be defined as application of
statistical methods to medical, biological and
public health related problems.
• It is the scientific treatment given to the medical data
derived from group of individuals or patients
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
3
Role of Statistics in physiotherapy
The main theory of statistics lies in the term variability.
There is No two individuals are same. For example, blood pressure of
person may vary from time to time as well as from person to person.
We can also have instrumental variability as well as
observers variability.
Methods of statistical inference provide largely objective means for
drawing conclusions from the data about the issue under study.
Medical science is full of uncertainties and statistics deals with
uncertainties. Statistical methods try to quantify the uncertainties
present in medical science.
It helps the researcher to arrive at a scientific judgment about
a hypothesis. It has been argued that decision making is an
integral part of a physiotherapist’s work.
Frequently, decision making is probability based.
4
Role of Statistics in
Public Health and Community Medicine
5
Why we need to study Medical Statistics?
Three reasons:
(1) Basic requirement of medical research.
6
Role of statisticians
To guide the design of an experiment or survey prior to
data collection
7
I. Basic concepts
• Homogeneity: All individuals have similar values or
belong to same category.
Example:
Example all individuals are Chinese, women, middle age (30~40
years old), work in a computer factory ---- homogeneity in nationality,
gender, age and occupation.
• Variation: the differences in feature, voice…
• Throw a coin: The mark face may be up or down ---- variation!
• Treat the patients suffering from pneumonia with same antibiotics: A
part of them recovered and others didn’t ---- variation!
• If there is no variation, there is no need for statistics.
• Many examples of variation in medical field: height, weight, pulse,
blood pressure, … …
8
2. Population and Sample
9
limited population and limitless population
12
Estimation of Probability----Frequency
14
5. Sampling Error
error :The difference between observed value and
true value.
15
Sampling error
• The statistics of different samples from same
population: different each other!
• The statistics: different from the parameter!
16
II. Types of data
1. Numerical Data ( Quantitative Data )
17
2. Categorical Data ( Enumeration Data )
-- Enumeration Data
18
Special case of categorical data :
Ordinal Data ( rank data )
• There exists order among all possible categories. ( level of
measurement)
-- Ordinal Data
• The data of ordinal variable, which represent the order of
individuals only
-- Rank data
19
Examples
Which type of data they belong to?
• RBC (4.58 106/mcL)
• Diastolic/systolic blood pressure
(8/12 kPa) or ( 80/100 mmHg)
• Percentage of individuals with blood type A (20%)
(A, B, AB, O)
• Protein in urine (++) ( - , ±, +, ++, +++)
• Incidence rate of breast cancer ( 35/100,000)
20
III. The Basic Steps of Statistical Work
1. Design of study
• Professional design:
Research aim
Subjects,
Measures, etc.
21
• Statistical design:
22
2. Collection of data
• Source of data
Government report system such as: cholera,
plague (black death) …
Registration system such as: birth/death
certificate …
Routine records such as: patient case report …
Ad hoc survey such as: influenza A (H1N1) …
23
• Data collection – Accuracy, complete,
in time
24
3. Data Sorting
• Checking
Hand, computer software
• Amend
• Missing data?
• Grouping
According to categorical variables (sex, occupation, disease…)
According to numerical variables (age, income, blood pressure …)
25
4. Data Analysis
• Descriptive statistics (show the sample)
mean, incidence rate …
-- Table and plot
• Inferential statistics (towards the population)
-- Estimation
-- Hypothesis testing (comparison)
26
About Teaching and Learning
• Aim:
Training statistical thinking
Skill of dealing with medical data.
• Emphasize:
Essential concepts and statistical thinking
-- lectures and practice session
Skill of computer and statistical software
-- practice session ( Excel and SPSS )
27
Sources of
data
Comprehensive Sample
28
Types of data
Constant
Variables
29
Types of variables
Quantitative Qualitative
continuous nominal
Quantitative Qualitative
descrete ordinal
30
Methods of presentation of data
Numerical presentation
Graphical presentation
Mathematical presentation
31
1- Numerical presentation
Tabular presentation (simple – complex)
Simple frequency distribution Table (S.F.D.T.)
Title
Name of variable
Frequency %
(Units of variable)
-
- Categories
-
Total
32
Table (I): Distribution of 50 patients at the surgical
department of AAAAA hospital in May 2008 according
to their ABO blood groups
A 12 24
B 18 36
AB 5 10
O 15 30
Total 50 100
33
Table (II): Distribution of 50 patients at the surgical
department of AAAAA hospital in May 2008 according to
their age
Age Frequency %
(years)
20-<30 12 24
30- 18 36
40- 5 10
50+ 15 30
Total 50 100
34
Complex frequency distribution Table
Table (III): Distribution of 20 lung cancer patients at the chest department of
AAAAA hospital and 40 controls in May 2008 according to smoking
Lung cancer
Total
Smoking Cases Control
No. % No. % No. %
Smoker 15 75% 8 20% 23 38.33
Non
smoker 5 25% 32 80% 37 61.67
Lung cancer
Total
Smoking positive negative
No. % No. % No. %
Smoker 15 65.2 8 34.8 23 100
Non
smoker 5 13.5 32 86.5 37 100
MMR/1000
Year MMR
60
50
1960 50
40 1970 45
30 1980 26
20
10 1990 15
0 2000 12
Year
1960 1970 1980 1990 2000
39
Sex
Age M-P
Frequency polygon M F
20- (12%) (10%) 25
Males Females 30- (36%) (30%) 35
%
40 40- (8%) (25%) 45
50- (16%) (15%) 55
35
30 60-70 (8%) (20%) 65
25
20
15
10
5
0
Age
25 35 45 55 65
8 Female
7 Male
6
Frequency
5
4
0
20- 30- 40- 50- 60-69
Age in years
41
Histogram
% 35
30
25
20
15
10
5
0
0 25 30 40 45 60 65
Age (years)
Figure (2): Distribution of 100 cholera patients at (place) , in (time)
by age 42
Bar chart
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed
MaritalMarital
Status status
43
Bar chart
%
50
Male
40 Female
30
20
10
0
Single Married Divorced Widowed
Marital status
Marital Status
44
Pie chart
Deletion
Inversion
3%
18%
Translocation
79%
45
Doughnut chart
Hospital B
DM
Hospital A IHD
Renal
46
3-Mathematical presentation
Summery statistics
Measures of location
1- Measures of central tendency
2- Measures of non central locations
(Quartiles, Percentiles )
Measures of dispersion
47
Summery statistics
Midrange
Smallest observation + Largest observation
2
Mode
the value which occurs with the greatest
frequency i.e. the most common value
48
Summery statistics
1- Measures of central tendency (cont.)
Median
the observation which lies in the middle of the
ordered observation.
49
Measures of dispersion
Range
Variance
Standard déviation
Semi-interquartile range
Coefficient of variation
“Standard error”
50
Standard déviation SD
7 8
7 7
7 77
7 77
6 3 2
7
7 8 13
Mean = 7 9
Mean = 7 SD=0.63
SD=0
Mean = 7
SD=4.04
51
Standard error of mean SE
A measure of variability among means of samples
selected from certain population
S
SE (Mean) = n
52
53