Professional Documents
Culture Documents
1 Introduction To Statistics 2021
1 Introduction To Statistics 2021
Statistics
1
INTRODUCTION
• Statistics
• Data
• Scientific method
• Collection, organization, presentation, analysis and
interpretation of data
• Biostatistics
• The application of statistics on biological or life science
data
2
INTRODUCTION …
• Data
• Measurements taken on variables
• Measurement
• Assigning values to objects
3
INTRODUCTION …
• Descriptive statistics: - Statistical procedures used
to summarize, organize, and simplify data.
• A number that conveys a particular characteristic of a
set of data.
• Summarizes a set of data with one number or graph.
4
Characteristics of statistical data
• They must be in aggregates – statistics are 'number of facts.' A
single fact, even though numerically stated, cannot be called
statistics.
• They must be affected to a marked extent by a multiplicity of
causes. –it is aggregates of such facts only as grow out of a '
variety of circumstances'.
• They must be enumerated or estimated according to a
reasonable standard of accuracy
• They must have been collected in a systematic manner for a
predetermined purpose.
• They must be placed in relation to each other. That is, they
must be comparable.
5
Type of variables
• Variable:- A characteristic which takes different
values in different persons, places, or things.
• Something that exists in more than one amount or
in more than one form.
• Any aspect of an individual or object that is
measured (e.g., BP) or recorded (e.g., age, sex) and
takes any value
• Variables can be qualitative (or categorical) or
quantitative (or numerical variables).
6
Qualitative variable:
• A variable or characteristic which cannot be
measured in quantitative form
• can only be identified by name or categories,
• for instance place of birth, ethnic group, type of
drug, stages of breast cancer (I, II, III, or IV), degree
of pain (minimal, moderate, severe or unbearable).
7
• Quantitative variable: A variable that can be
measured (or counted) and expressed numerically.
8
Quantitative variable is divided into two:
1. Discrete: It can only have a limited number of discrete
values (usually whole numbers).
• E.g., the number of episodes of diarrhoea a child has had in a
year. You can’t have 12.5 episodes of diarrhoea
• Characterized by gaps or interruptions in the values
(integers).
• Both the order and magnitude of the values matter.
• The values aren’t just labels, but are actual measurable
quantities.
9
2. Continuous variable: It can have an infinite
number of possible values in any given interval.
• Both the magnitude and the order of the values matter
• Does not possess the gaps or interruptions
• Weight is continuous since it can take on any number of
values (e.g., 34.575 Kg).
10
Scales of measurement
11
Scales of measurement
12
Scales of measurement
13
1. Nominal scale:
• Measurement scale in which numbers serve only as
labels and do not indicate any quantitative
relationship.
• The simplest type of data, in which the values fall into
unordered categories or classes
• Consists of “naming” observations or classifying them
into various mutually exclusive and collectively
exhaustive categories
• Uses names, labels, or symbols to assign each
measurement.
• Examples: Blood type, sex, race, marital status, etc.
14
Example of nominal Scale:
Race/Ethnicity:
1. Black • The numbers have NO
2. White meaning
3. Latino • They are labels only
4. Other
15
• If nominal data can take on only two possible values,
they are called dichotomous or binary.
• So sex is not just nominal, it is dichotomous (male or
female).
• Yes/no questions
• E.g., cured from TB at 6 months of Rx
16
2. Ordinal scale:
• has the characteristic of the nominal scale (different numbers mean
different things) plus the characteristic of indicating greater than or
less than.
• Assigns each measurement to one of a limited number of categories
that are ranked in terms of order.
• Measurement scale in which numbers are ranks;
• equal differences between numbers do not represent equal
differences between the things measured.
• Although non-numerical, can be considered to have a natural ordering
• Examples: Patient status, cancer stages,
social class, etc.
17
Example of ordinal scale:
18
3. Interval scale:
- has the properties of both the nominal and ordinal scales plus
the
- additional property that intervals between the numbers are
equal.
- Measured on a continuum and differences between any two
numbers on a scale are of known size.
Example: Temp. in oF on 4 consecutive days
Days: A B C D
Temp. oF: 50 55 60 65
For these data, not only is day A with 50o cooler than day D with
65o, but is 15o cooler.
19
3. Interval scale:
- It has no true zero point. “0” is arbitrarily chosen and
doesn’t reflect the absence of temp.
- The zero point is arbitrarily defined.
- You may not make simple ratio statements
- You may not say that 100° is twice as hot as 50° or
- that a person with an IQ of 60 is half as intelligent as a
person with an IQ of 120.
20
4. Ratio scale:
- has all the characteristics of the nominal, ordinal, and
interval scales plus one other:
- It has a true zero point, which indicates a complete
absence of the thing measured.
- On a ratio scale, zero means “none.”
- Measurement begins at a true zero point and the scale
has equal space.
- Examples: Height, age, weight, BP, etc.
• Note on meaningfulness of “ratio”- you can make ratio
statements
• Someone who weighs 80 kg is two times as heavy as
someone else who weighs 40 kg. This is true even if weight
had been measured in other measurements.
21
Characteristics of the four scales of measurement
22
23
Interval
Ordinal
Nominal
Ratio
Degree of precision in measuring
Data
• Data are numbers which can be measurements or can
be obtained by counting
• The raw material for statistics
• Can be obtained from:
• Routinely kept records, literature
• Surveys
• Counting
• Experiments
• Reports
• Observation
• Etc
24
Data
• raw score/data: Score obtained by observation or
from an experiment
25
Types of Data
26
Methods of Data collection
• Questionnaires
• Interviews
• Focus group interviews
• Observation
• Documentary source
27
Methods of data organization
and presentation
28
Descriptive statistics (Describing variables)
29
1.Describing categorical variables
• Charts
• Bar charts
• Pie charts
30
Statistical Tables
• a table could be either of simple frequency table or
cross tabulation.
• The simple frequency table
• is used when the individual observations involve only to
a single variable
• The cross tabulation
• is used to obtain the frequency distribution of one
variable by the subset of another variable.
31
Construction of tables
• Tables should be as simple as possible.
• Tables should be self-explanatory. For that purpose
• Title should be clear and to the point( a good title answers:
what? when? where? how classified ?) and it be placed above
the table.
• Each row and column should be labelled.
• Numerical entities of zero should be explicitly written rather
than indicated by a dash. Dashed are reserved for missing or
unobserved data.
• Totals should be shown
• If data are not original, their source should be given in a
footnote.
32
Frequency distributions
33
Simple Frequency Distributions
• Scores arranged from highest to lowest, with the frequency shown for
each score.
• column tells
• the name of the variable that is being measured. The generic
name for any variable is X, which is the symbol used in formulas.
• The Frequency ( f ) column shows how frequently a score
occurred.
• The tally marks are used when you construct a rough draft
version and are not usually included in the final form
• N is the number of scores and is found by summing the numbers
in the f column.
• useful way to present a set of data because you pick up valuable
information with just a glance.
34
Raw data
35
Steps to construct simple frequency distributions
1. Find the highest and lowest scores. Highest score is 35; lowest is 5.
2. In column form, write in descending order all the numbers. 35 to 5.
3. At the top of the column, name the variable being measured. Satisfaction With
Life Scale scores.
4. Start with the number in the upper left-hand corner of the scores, draw a line
under it, and place a tally mark beside that number in the column of numbers.
Underline 15 in Table 2.1, and place a tally mark beside 15 in Table 2.2.
5. Continue underlining and tallying for all the unorganized scores.
6. Add a column labeled f (frequency).
7. Count the number of tallies by each score and enter the count in the f column.
2, 1, 2, 4, . . . , 0, 2.
8. Add the numbers in the f column. If the sum is equal to N, you haven’t left out
any scores. Sum = 100.
36
37
38
Group frequency
• Compilation of scores into equalsized ranges (class
intervals), with the frequency shown for each
interval.
• class interval
• A range of scores in a grouped frequency distribution.
• The midpoint of each interval represents all the
scores in that interval.
39
40
Relative Frequency
41
Cumulative frequency
42
Table 2. Frequencies of serum cholesterol levels for
1067 US males of ages 25-34, (1976-1987).
---------------------------------------------------------------------------------------------------------------------
Cholesterol level
Mg/100ml freq Relative freq(%) Cum freq Cum.rel. freq(%)
----------------------------------------------------------------------------------------------------------------------
80-119 13 1.2 13 1.2
120-159150 14.1 163 15.3
160-199442 41.4 605 56.7
200-239299 28.0 904 84.7
240-279115 10.8 1019 95.5
280-31934 3.2 1053 98.7
320-3599 0.8 1062 99.5
360-3995 0.5 1067 100
----------------------------------------------------------------------------------------------------------------------
Total 1067 100
43
Table 1. Distribution of birth weight of
newborns between 1976-1996 at TAH.
44
Graphs of Frequency
Distributions
45
Charts
• Bar charts: display the frequency distribution for nominal or ordinal data.
• The vertical axis should always start from 0 but the horizontal can start from
any where.
• The bars should be of equal width and should be separated from one
another so as not to imply continuity
46
10000 100.0%
9000 8870
90.0% 89%
8000
80.0%
7000
6000 70.0%
5000
60.0%
4000
50.0%
3000
2000 40.0%
1000 793
268 30.0%
43
0
g 20.0%
al
w
Bi
Lo
rm
No
10.0% 8%
3%
w
0%
0.0%
lo
48
6000
100 88.989
90
5000
80
4000
70
60
Percent
Freq.
50 Yes
3000
40 No
2000 30
Antenatal Care
20 9 7.9
1000 10 2.13.1
No
NNo 0
Yes
Low Normal Big
0
Low Normal Big
BWT BWT
Fig 2. Bar chart indicating categories of birth weight of 9975 newborns grouped by
antenatal follow-up of the mothers
49
Pie chart
50
Pie chart…
51
43 793
268
Very low
Low
Normal
Big
8870
0.4 8
2.7
Very low
Low
Normal
Big
88.9
• Graphs
• Histograms
• Frequency polygons
• Cumulative frequency polygons
53
Graphs...
• They give delight to the eye, add a spark of interest and as such
catch the attention as much as the figures dispel it.
54
Graphs…
• Every graph should be self-explanatory and as simple as possible.
• Titles are usually placed below the graph and it should again
question what ? Where? When? How classified?
• Legends or keys should be used to differentiate variables if more
than one is shown.
• The axes label should be placed to read from the left side and from
the bottom.
• The units in to which the scale is divided should be clearly indicated.
• The numerical scale representing frequency must start at zero or a
break in the line should be shown.
55
Histograms
• Are frequency distributions with continuous class interval that
have been turned into graphs.
56
Histograms……………..
57
58
2000
1800
1600
1400
1200
1000
800
600
F re q u e n c y
400
Std. Dev = 502.34
200 Mean = 3126
0 N = 9975.00
Birth weight
60
61
50
40
%
30
20
SEX
10
Males
Females
0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Birth Weight
-------------------------------------------------------------------------------------------------------------
Cholesterol level
Mg/100ml freq Relative freq Cum freq Cum.rel. freq
-------------------------------------------------------------------------------------------------------------
80-119 13 1.2 13 1.2
120-159 150 14.1 163 15.3
160-199442 41.4 605 56.7
200-239299 28.0 904 84.7
240-279115 10.8 1019 95.5
280-31934 3.2 1053 98.7
320-3599 0.8 1062 99.5
360-3995 0.5 1067 100
-------------------------------------------------------------------------------------------------------------
Total 1067 100
63
Table 6. Frequencies of serum cholesterol levels for 1227 US
males of ages 55-64 1976-1980
--------------------------------------------------------------------------------------------------
Cholesterol level
Mg/100ml freq Relative freq Cum freq Cum.rel. freq
--------------------------------------------------------------------------------------------------
80-119 5 0.4 5 0.4
120-159 48 3.9 53 4.3
160-199 265 21.6 318 25.9
200-239 458 37.3 776 63.2
240-279 281 22.9 1057 86.1
280-319 128 10.4 1185 96.5
320-359 35 2.9 1220 99.4
360-399 7 0.5 1227 100
------------------------------------------------------------------------------------------------
Total 1227 100
64
45 100
40 90
35 80
30 70
25 Ages 25-34 60
elativefrequency(%)
Ages 55-64 50
20 Ages 25-34
y(%
)
40
Ages 55-64
c
15
q e
un
30
efre
10
R
tiv
20
la
ere
5
tiv
10
mla
u
0
u
C
80-119 120-159 160-199 200-239 240-279 280-319 320-359 360-399 0
80-119 120-159 160-199 200-239 240-279 280-319 320-359 360-399
Serum cholesterol levels (mg/100ml)
Serum cholesterol levels (mg/100ml)
Fig. 7. Frequency polygon (Ogive curves Vs survival curves) and Cumulative frequency
polygons of serum cholesterol levels for 2294 males aged 25-34 and 55-64 years, 1976-1980
65
THANK YOU!
66