You are on page 1of 56

Data , epidemiology, the link

to biostatistics.
Part I

1
Out lines:

- introduction
- Planning the Study
- Biostatistics and epidemiology
- Definition of biostatistics
- definitions of Some basic terms
- Applications, Uses and Limitations of statistics
- Limitations of statistics
- Types of Variables and Measurement Scales
- source of data
- Stages in statistical investigation
- Methods of presentation of data

2
introduction
• Epidemiologic studies need to be carefully planned
before they begin. This is true for any kind of study.
• A classical mistake that is still being made from time
to time consists in saying: “Let us collect the data
first, and then start thinking about how to analyze
them”.
• that the analysis and interpretation of the data
depends crucially on the way they had been obtained.

3
• The team of investigators needs to define not only
the data collection but also the core methods of
analysis and evaluation before starting the study;
they are part of the study plan.
• In almost all stages of a study there may be
systematic errors, in other words a “bias”
• What to do about the various possible forms of
bias needs to be said in the study plan.

4
Planning the Study
1. To start an epidemiological study, the objectives must
be well defined including the target population, the
exposure factors, the outcome variables and their
relations.
2. it should also be said why the study is required and
with which degree of urgency. This involves a survey
of the existing publications on the problem at hand.
3. The team of people to conduct the study has to be
constituted.
5
4. The type of study is part of the plan.

5. The way of selecting the study population, that is the various


samples entering the study including their sample size, is to be
described in detail.

6. The plan must indicate how the values of the exposure and
outcome variables are to be measured, in other words how the
data are to be obtained.

6
7. The analysis of the data and the interpretation of the result will
require a person who knows the relevant mathematical–statistical
methods, for example a competent epidemiologist or a mathematical
statistician.

8. This person needs to be a member of the study team from the very
beginning and has to take part in drawing up all aspects of the planning.

9.

7
9. Finally the protocol will say what to do with the results.
What kind of report is to be submitted to which services? Is it
intended to make recommendations to the population,
health officials, politicians etc.
10. the question of funding the study. The timeframe of the
study, too, needs to be defined by the plan.
11. it will be reasonable to foresee a pilot study on a smaller
scale in order to test some of the methods to be employed in
the study, in particular in the collection of data.

8
12. In addition to these “obvious” components of the
study plan there are the ethical considerations that
has to be considered and cleared in the study
protocol.

9
Biostatistics and epidemiology

10
Definition of Statistics
• the science, which deals with collection,
presentation, analysis and interpretation of
numerical data.
• The science and art of dealing with variation in data through
collection, classification, and analysis in such a way as to obtain
reliable results. —(John M. Last, A Dictionary of Epidemiology )
• Branch of mathematics that deals with the collection, organization,
and analysis of numerical data and with such problems as
experiment design and decision making. —(Microsoft
Encarta Premium 2009)

11
Definition of biostatistics

• Biostatistics is a growing field with


applications in many areas of biology
including epidemiology, medical sciences,
health sciences, educational research and
environmental sciences.

12
classification of Biostatistics

• Descriptive statistics: A statistical method that is


concerned with the collection, organization, summarization,
and analysis of data from a sample of population.
• Inferential statistics A statistical method that is
concerned with the drawing conclusions/infering about a
particular population by selecting and measuring a random
sample from the population

13
Descriptive Statistics

• Statistical procedures used to summarise,


organise, and simplify data. This process should
be carried out in such a way that reflects overall
findings.
• ► Raw data is made more manageable
• ► Raw data is presented in a logical form
• ► Patterns can be seen from organized data

14
Some statistical summaries which are especially
common in descriptive analyses are:


Measures of central tendency

Measures of dispersion

Measures of association

Cross-tabulation, contingency table

Histogram

Quantile, Q-Q plot

Scatter plot

Box plot

15
Inferential Statistics
• Inferential Statistics

• This branch of statistics deals with techniques of making conclusions about the populatio
• Inferential statistics builds upon descriptive statistics
• The inferences are drawn from particular properties of sample to particular properties
population
• Inferential statistics are used to make generalizations from a sample to a population.
• They encompasses a variety of procedures to ensure that the inferences are sound and
rational, even though they may not always be correct

16
17
definitions of Some basic terms

• Population
• Census
• Sample
• Parameter
• Statistic, Statistics
• Sampling Q sample size Q Variable
• Data

18
• Population : is the complete set of possible measurements for
which inferences are to be made.

• Census : a complete enumeration of the population. But in


most real problems it cannot be realized, hence we take
sample.

• Sample: A sample from a population is the set of measurements


that are actually collected in the course of an investigation.

19
• Parameter: Characteristic or measure obtained from a
population.

• Statistic :A statistic refers to a numerical quantity computed


from sample data (e.g. the mean, the median, the maximum...).

• Data : Refers to a collection of facts, values, observations, or


measurements that the variables can assume.

20
• Statistics: is a branch of mathematics dealing with data
collection, organization, analysis, interpretation and
presentation.

• Sampling : The process or method of sample selection


from the population.

• Sample Size : The number of elements or observation to


be included in the sample.

21
Random
By chance!
• Random event: the event may occur or may not
occur in one experiment.
Before one experiment, nobody is sure whether
the event occurs or not.
Example:
Example weather, traffic accident, …
There must be some regulation in a large number
of experiments.
22
Probability
• Measure the possibility of occurrence of a random
event.
• A : random event
• P(A) : Probability of the random event A
P(A)=1, if an event always occurs.
P(A)=0, if an event never occurs.

23
• Variable: It is an item of interest that can take on
many different numerical values.
• Some examples of variables include:
• ► Diastolic blood pressure,
• ► heart rate, heights,
• ► The weights,
• ► Stage of bladder cancer patients,

24
Applications, Uses and Limitations of statistics.

• Applications of Statistics
► In almost all fields of human endeavor
► Almost all human beings in their daily life are
subjected to obtaining numerical facts e.g. abut
price.
► Applicable in some process e.g. invention of
certain drugs, extent of environmental pollution.
► In industries especially in quality control area
25
• Uses of Statistics
The main function of statistics is to enlarge our knowledge of complex
phenomena. The following are some uses of statistics:
• It presents facts in a definite and precise form.
• Data reduction.
• Measuring the magnitude of variations in data.
• Furnishes a technique of comparison.
• Estimating unknown population characteristics.
• Testing and formulating of hypothesis.
• Studying the relationship between two or more variable.
• Forecasting future events

26
Limitations of statistics

• As a science statistics has its own limitations. The


following are some of the limitations:
I Deals with only quantitative information.
II Deals with only aggregate of facts and not with individual
data items.
IIIStatistical data are only approximately and not
mathematical correct.
IVStatistics can be easily misused and therefore should be
used be experts
27
Types of Variables and Measurement Scales

• variable
• A variable is a characteristic or attribute that can
assume different values in different persons, places,
or things.
Depending on the characteristics of the measurement
variable:

28
• 1 Qualitative(Categorical) variable
► A variable or characteristic which cannot be measured in
quantitative form but can only be identified by name or categories,
► for instance place of birth, ethnic group, type of drug, stages of
breast cancer (I, II, III, or IV), degree of pain (minimal, moderate,
sever or unbearable).
► The categories should be clear cut, not overlapping, and cover
all the possibilities. For example, sex (male or female), vital status
(alive or dead), disease stage (depends on disease), ever smoked (yes
or no).

29
2. Quantitative(Numerical) variable:
► is one that can be measured and expressed
numerically.
•Example:
► survival time
► systolic blood pressure
► number of children in a family
► height, age, body mass index.

30
On the basis of Scales of measurement:

There are four types of measurement scales:


•1. Nominal scales of measurement
•► Only ”naming” and classifying observations is possible.
When numbers are assigned to categories, it is only for coding
purposes and it does not provide a sense of size Example:
•► Sex of a person (M, F)
•► eye color (e.g. brown, blue)
•► religion (Muslim, Christian)
•► place of residence (urban, rural) etc
31
• 2. Ordinal Scales of Measurement
• Q Categorization and ranking (ordering) observations is possible
• ► We can talk of greater than or less than and it conveys
meaning to the value but;
• ► Impossible to express the real difference between
measurements in numerical terms
• Example:
• ► Socio-economic status (very low, low, medium, high, very high)
• ► severity(mild, moderate, sever)
• ► blood pressure (very low, low, high, very high) etc.

32
• 3. Interval Scales of Measurement
• ► Possible to categorize, rank and tell the real
distance between any two measurements
• ► Zero is not absolute
• Example:
• ► Body temperature in degree F. and Celsius
(measured in degrees).
• ► It is a meaningful difference
33
• 4. Ratio scales of Measurement
• ► the highest level of measurement scale, characterized by the
fact that equality of ratios as well as equality of intervals can be
determined
• ► There is a true zero point. i.e. zero is absolute Example:
• ► volume
• ► height
• ► weight
• ► length
• ► time until death, etc...

34
Types of variables

35
36
source of data
1. Primary Data:

• ► Data generated for the first time


primarily/originally for the study in question
• ► It needs the involvement of the researcher
himself. Census and sample survey are sources of
primary types of data

37
2. secondary Data:
•► Obtained from other pre-existing/ priorly collected sources
•► In this case data were obtained from already collected sources
like newspaper, magazines, DHS, hospital records and existing
•data like:
•► Mortality reports
•► Morbidity reports
•► Epidemic reports
•► Reports of laboratory utilization (including laboratory test
results)

38
Sources of data

Records Surveys Experiments

Comprehensive Sample

39
Stages in statistical investigation

• There are five stages or steps in any statistical investigation


1. Collection of data
► The process of obtaining measurements or counts.
2. Organization of data
► Includes editing, classifying, and tabulating the data collected
• 3. Presentation of data
► overall view of what the data actually looks like
► facilitate further statistical analysis
► Can be done in the form of tables and graphs or diagrams

40
• 4. Analysis of data
► To dig out useful information for decision making
► It involves extracting relevant information from the
data (like mean, median, mode, range, variance. . . )
• 5. Interpretation of data
► Concerned with drawing conclusions from the data
collected and analyzed; and giving meaning to analysis results
► A difficult task and requires a high degree of skill and
experience

41
Methods of presentation of data

Numerical presentation
Graphical presentation
Mathematical presentation

42
1- Numerical presentation
Tabular presentation (simple – complex)
Simple frequency distribution Table (S.F.D.T.)
Title
Name of variable
Frequency %
(Units of variable)
-
- Categories
-

Total
43
44
Line Graph

MMR/1000
Year MMR
60
50
1960 50
40 1970 45
30 1980 26
20
10 1990 15
0 2000 12
Year
1960 1970 1980 1990 2000

Figure (1): Maternal mortality rate of (country),


1960-2000
45
Sex
Age M-P
Frequency polygon M F
20- (12%) (10%) 25
Males Females 30- (36%) (30%) 35
%
40 40- (8%) (25%) 45
50- (16%) (15%) 55
35
30 60-70 (8%) (20%) 65

25
20
15
10
5
0
Age
25 35 45 55 65

Figure (2): Distribution of 45 patients at (place) , in (time)


by age and sex 46
Frequency curve

8 Female

7 Male

6
Frequency

5
4

0
20- 30- 40- 50- 60-69
Age in years

47
Histogram

% 35
30
25
20
15
10
5
0
0 25 30 40 45 60 65
Age (years)
Figure (2): Distribution of 100 cholera patients at (place) , in (time)
48
by age
Bar chart
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed

MaritalMarital
Status status

49
Pie chart
Deletion
Inversion
3%
18%

Translocation
79%

50
Doughnut chart

Hospital B

DM
Hospital A IHD
Renal

51
3-Mathematical presentation
Summery statistics

Measures of location
1- Measures of central tendency
2- Measures of non central locations
(Quartiles, Percentiles )
Measures of dispersion

52
53
Assignment :
Categorize the following variables into nominal,
ordinal, interval or ratio
► Gender
► Grade(A, B, C, D and F )
► Rating scale(poor, good, excellent)
► Eye color
► Political affiliation
► Religious affiliation
► Ranking of tennis players

54
• Major field
• ► Nationality
• ► Height
• ► Weight
• ► Time
• ► Age
• ► IQ
• ► Temperatur

55
Questions????????????????????

• Thank you

56

You might also like