2descriptive Statistics

Descriptive Statistics
Descriptive Statistics
• Techniques used to organize and
summarize a set of data in a concise way.
– Organization of data
– Summarization of data
– Presentation of data
• Numbers that have not been summarized
and organized are called raw data.
Descriptive statistics include:
• Tables
• Graphs
• Numerical summary measures
- Measures of central tendency
- Measures of variability
• Before summarization and organization,
we need to know the types of variables
and measurement scales of our data.
• Before displaying or analyzing data,
classify the variables into their different
types.
Variable
• Variable: A characteristic which takes
different values in different persons, places,
or things.
• Any aspect of an individual or object that is
measured (e.g., BP) or recorded (e.g., age,
sex) and takes any value.
• There may be one variable in a study or
many.
• E.g., A study of treatment outcome of TB
• Variables can be broadly classified
into:
– Categorical (or Qualitative) or
– Quantitative (or numerical variables).
• Categorical variable: A variable or
characteristic which can not be measured in
quantitative form but can only be sorted by
name or categories
• Not able to be measured as we measure

height or weight
• The notion of magnitude is absent or implicit.

• Quantitative variable: A variable that can
be measured (or counted) and expressed
numerically.
• Height, wt, # of children, etc.
• Has the notion of magnitude.

Quantitative variable is divided into two:
1. Discrete: It can only have a limited number of
discrete values (usually whole numbers).
– E.g., the number of episodes of diarrhoea a child has
had in a year. You can’t have 12.5 episodes of diarrhoea
• Characterized by gaps or interruptions in the
values (integers).
• Both the order and magnitude of the values matter.
• The values aren‟t just labels, but are actual
measurable quantities.
2. Continuous variable: It can have an
infinite number of possible values in any
given interval.
• Both the magnitude and the order of the
values matter
• Does not possess the gaps or interruptions
• Weight is continuous since it can take on
any number of values (e.g., 34.575 Kg).
SUMMARY
Variable
Types
of Qualitative Quantitative
variables or categorical measurement
Nominal Ordinal Discrete Continuous

(not ordered) (ordered) (count data) (real-valued)
e.g. ethnic e.g. response e.g. # of e.g. height
group to treatment admissions
Measurement scales
Scales of measurement
• All measurements are not the same.
• Measuring weight = eg. 40kg
• Measuring the status of a patient on scale
= “improved”, “stable”, “not improved”.
• There are four types of scales of
measurement.
1. Nominal scale:
• The simplest type of data, in which the values
fall into unordered categories or classes
• Consists of “naming” observations or
classifying them into various mutually
exclusive and collectively exhaustive
categories
• Uses names, labels, or symbols to assign each
measurement.
– Examples: Blood type, sex, race, marital status, etc.
Example of nominal Scale:
Race/Ethnicity:
1. Black • The numbers have NO
2. White meaning
3. Latino • They are labels only
4. Other
• If nominal data can take on only two
possible values, they are called
dichotomous or binary.
• So sex is not just nominal, it is
dichotomous (male or female).
• Yes/no questions
– E.g., cured from TB at 6 months of Rx
2. Ordinal scale:
• Assigns each measurement to one of a
limited number of categories that are
ranked in terms of order.
• Although non-numerical, can be
considered to have a natural ordering
– Examples: Patient status, cancer stages,
social class, etc.
Example of ordinal scale:
• Pain level: • The numbers have

1. None LIMITED meaning
2. Mild 4>3>2>1 is all we
3. Moderate know apart from their
utility as labels
4. Severe
3. Interval scale:
- Measured on a continuum and differences
between any two numbers on a scale are of
known size.
Example: Temp. in oF on 4 consecutive days
Days: A B C D
Temp. oF: 50 55 60 65
For these data, not only is day A with 50o cooler
than day D with 65o, but is 15o cooler.
- It has no true zero point. “0” is arbitrarily chosen
and doesn‟t reflect the absence of temp.
4. Ratio scale:
- Measurement begins at a true zero point
and the scale has equal space.
- Examples: Height, age, weight, BP, etc.
• Note on meaningfulness of “ratio”-
– Someone who weighs 80 kg is two times as
heavy as someone else who weighs 40 kg.
This is true even if weight had been measured
in other measurements.
Interval
Ordinal
Nominal
Ratio
Degree of precision in measuring
Methods of Data Organization
and Presentation
Frequency Distributions (Tables)
• Ordered array: A simple arrangement of individual
observations in the order of magnitude.
• Very difficult with large sample size
12 19 27 36 42 59
15 22 31 39 43 61
17 23 31 41 44 65
18 26 34 41 54 67
• The actual summarization and organization
of data starts from frequency distribution.
• Frequency distribution: A table which

has a list of each of the possible values
that the data can assume along with the
number of times each value occurs.
• For nominal and ordinal data, frequency
distributions are often used as a summary.
• Example:
• The % of times that each value occurs, or

the relative frequency, is often listed
• Tables make it easier to see how the data
are distributed
• For both discrete and continuous data,
the values are grouped into non-
overlapping intervals, usually of equal
width.
a) Qualitative variable: Count the number of
cases in each category.
- Example1: The intensive care unit type of 25

patients entering ICU at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other
Frequency Relative Frequency
ICU Type (How often) (Proportionately often)
Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08
Total 25 1.00
Example 2:
A study was conducted to assess the
characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female
Gender Frequency (n) Relative Frequency

Male (1) 110 47.0%
Female (2) 124 53.0%
Total 234 100%
b) Quantitative variable:
- Select a set of continuous, non-overlapping
intervals such that each value can be placed
in one, and only one, of the intervals.
- The first consideration is how many intervals
to include
For a continuous variable
(e.g. – age), the frequency
distribution of the individual
ages is not so interesting.
• We “see more” in frequencies
of age values in “groupings”.
Here, 10 year groupings make
sense.
• Grouped data frequency
distribution
To determine the number of class intervals and the
corresponding width, we may use:
Sturge’s rule:
K  1  3.322(logn)
LS
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
Example:
– Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5
Time Relative Cumulative
(Hours) Frequency Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00
Total 40 1.00
• Cumulative frequencies: When frequencies
of two or more classes are added.
• Cumulative relative frequency: The

percentage of the total number of
observations that have a value either in that
interval or below it.
• Mid-point: The value of the interval which lies

midway between the lower and the upper
limits of a class.
• True limits: Are those limits that make an
interval of a continuous variable continuous
in both directions
• Used for smoothening of the class intervals
• Subtract 0.5 from the lower and add it to the

upper limit
Time
(Hours) True limit Mid-point Frequency
10-14 9.5 – 14.5 12 5
15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2
Total 40
Simple Frequency Distribution
• Primary and secondary cases of syphilis
morbidity by age, 1989
Age group Cases
(years) Number Percent
0-14 230 0.5
15-19 4378 10.0
20-24 10405 23.6
25-29 9610 21.8
30-34 8648 19.6
35-44 6901 15.7
45-54 2631 6.0
>44 1278 2.9
Total 44081 100
Two Variable Table
• Primary and secondary cases of syphilis
morbidity by age and sex, 1989
Age group Number of cases
(years) Male Female Total
0-14 40 190 230
15-19 1710 2668 4378
20-24 5120 5285 10405
25-29 5301 4306 9610
30-34 5537 3111 8648
35-44 5004 1897 6901
45-54 2144 487 2631
>44 1147 131 1278
Total 26006 18075 44081
Tables can also be used to present more than
three or more variables.
Variable Frequency (n) Percent
Sex
Male
Female
Age (yrs)
15-19
20-24
25-29
Religion
Christian
Muslim
Occupation
Student
Farmer
Merchant
Guidelines for constructing tables
• Keep them simple,
• Limit the number of variables to three or less,
• All tables should be self-explanatory,
• Include clear title telling what, when and where,
• Clearly label the rows and columns,
• State clearly the unit of measurement used,
• Explain codes and abbreviations in the foot-note,
• Show totals,
• If data is not original, indicate the source in foot-note.
Diagrammatic Representation
• Pictorial representations of numerical data

Importance of diagrammatic representation:
1. Diagrams have greater attraction than

mere figures.
2. They give quick overall impression of the
data.
3. They have great memorizing value than
mere figures.
4. They facilitate comparison
5. Used to understand patterns and trends
• Well designed graphs can be powerful
means of communicating a great deal of
information
• When graphs are poorly designed, they not

only ineffectively convey message, but they
are often misleading.
Specific types of graphs include:
• Bar graph Nominal, ordinal
• Pie chart data
• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
• Scatter plot data
• Line graph
• Others
1. Bar charts (or graphs)
• Categories are listed on the horizontal axis
(X-axis)
• Frequencies or relative frequencies are
represented on the Y-axis (ordinate)
• The height of each bar is proportional to
the frequency or relative frequency of
observations in that category
Bar chart for the type of ICU for 25 patients
Method of constructing bar chart
• All the bars must have equal width
• The bars are not joined together (leave
space between bars)
• The different bars should be separated
by equal distances
• All the bars should rest on the same line
called the base
• Label both axes clearly
Example: Construct a bar chart for the following data.
Distribution of patients in hospital by source of referral

Source of referral No. of patients Relative freq.
Other hospital 97 5.1
General practitioner 769 40.3
Out-patient department 623 32.7
Casualty 256 13.4
Other 161 8.5
Total 1 906 100.0
Distribution of patients in hopital X by source of referal, 1999
769
800
700 623
600
No. of patients
500
400
300 256
200 161
97
100
0
Other GP OPD Casualty Other
hospital
Source of referal
2. Sub-divided bar chart
• If there are different quantities forming
the sub-divisions of the totals, simple
bars may be sub-divided in the ratio of
the various sub-divisions to exhibit the
relationship of the parts to the whole.
• The order in which the components are
shown in a “bar” is followed in all bars
used in the diagram.
– Example: Stacked and 100% Component
bar charts
Example: Plasmodium species distribution for
confirmed malaria cases, Zeway, 2003
100 Mixed
P. vivax
80 P. falciparum
60
Percent
40
20
0
August October December
2003
3. Multiple bar graph
• Bar charts can be used to represent the
relationships among more than two
variables.
• The following figure shows the
relationship between children‟s reports
of breathlessness and cigarette
smoking by themselves and their
parents.
Prevalence of self reported breathlessness among school
childeren, 1998
35
Breathlessness, per cent
30
25
20
15
10
5
0
Neither One Both
Parents smooking
Child never smoked smoked occassionaly child smoked one/week or more
We can see from the graph quickly that the prevalence of the symptoms
increases both with the child’s smoking and with that of their parents.
There’s no reason why the bar chart can’t be
plotted horizontally instead of vertically.
CHA
Type of source
HC
Reading
Training femal
male
e
Campaign
Anti FGMC
CAT
0 10 20 30 40 50
Percent
Figure 1. Source of information on the complications of FGM and participation in RH

programs, Jijiga, 2004*. * FGMC = female genital mutilation committee; CAT= community
action team; HC = health centre; CHA= community health agent
4. Pie chart
• Shows the relative frequency for each
category by dividing a circle into sectors, the
angles of which are proportional to the
relative frequency.
• Used for a single categorical variable
• Use percentage distributions
Steps to construct a pie-chart
• Construct a frequency table
• Change the frequency into percentage (P)
• Change the percentages into degrees,

where: degree = Percentage X 360o
• Draw a circle and divide it accordingly

Example: Distribution of deaths for females, in England
and Wales, 1989.
Cause of death No. of death

Circulatory system 100 000
Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000
Total 236 000
Distribution fo cause of death for females, in England and Wales, 1989
Others
8%
Digestive System
4%
Injury and Poisoning
3%
Circulatory system
Respiratory system
42%
13%
Neoplasmas
30%
5. Histogram
• Histograms are frequency distributions with
continuous class intervals that have been turned
into graphs.
• To construct a histogram, we draw the interval
boundaries on a horizontal line and the
frequencies on a vertical line.
• Non-overlapping intervals that cover all of the
data values must be used.
• Bars are drawn over the intervals in such a
way that the areas of the bars are all
proportional in the same way to their
interval frequencies.
• The area of each bar is proportional to the

frequency of observations in the interval
Example: Distribution of the age of women at the time of marriage
Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49

group
Number 11 36 28 13 7 3 2
Age of women at the time of marriage
40
35
30
No of women
25
20
15
10
0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
Histogram for the ages of 2087 mothers with <5
children, Adami Tulu, 2003
700
600
500
400
300
200
100 Std. Dev = 6.13

Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
N1AGEMOTH
Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective
groups are lost and difficult to reconstruct
 The other graphic display (stem-and-

leaf plot) overcomes these problems
6. Stem-and-Leaf Plot
• A quick way to organize data to give visual
impression similar to a histogram while retaining
much more detail on the data.
• Similar to histogram and serves the same purpose
and reveals the presence or absence of symmetry
• Are most effective with relatively small data sets
• Are not suitable for reports and other
communications, but
• Help researchers to understand the nature of their
data
Example
• 43, 28, 34, 61, 77, 82, 22, 47, 49, 51,
29, 36, 66, 72, 41
2 2 8 9
3 4 6
4 1 3 7 9
5 1
6 1 6
7 2 7
8 2
Steps to construct Stem-and-Leaf Plots
1. Separate each data point into a stem and

leaf components
• Stem = consists of one or more of the initial
digits of the measurement
• Leaf = consists of the rightmost digit
The stem of the number 483, for example, is 48
and the leaf is 3.
2. Write the smallest stem in the data set in
the upper left-hand corner of the plot
Steps to construct Stem-and-Leaf Plots
3. Write the second stem (first stem +1) below the

first stem
4. Continue with the remaining stems until you
reach the largest stem in the data set
5. Draw a vertical bar to the right of the column of
stems
6. For each number in the data set, find the
appropriate stem and write the leaf to the right
of the vertical bar
Example: 3031, 3101, 3265, 3260, 3245, 3200,
3248, 3323, 3314, 3484, 3541, 3649 (BWT in g)
Stem Leaf Number

30 31 1
31 01 1
32 65 60 45 00 48 5
33 23 14 2
34 84 1
35 41 1
36 49 1
7. Frequency polygon
• A frequency distribution can be portrayed
graphically in yet another way by means of a
frequency polygon.
• To draw a frequency polygon we connect the
mid-point of the tops of the cells of the
histogram by a straight line.
• The total area under the frequency polygon is
equal to the area under the histogram
• Useful when comparing two or more
frequency distributions by drawing them on
the same diagram
Frequency polygon for the ages of 2087 mothers with <5
children, Adami Tulu, 2003
700
600
500
400
300
200
100 Std. Dev = 6.13

Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
N1AGEMOTH
It can be also drawn without erecting rectangles by joining
the top midpoints of the intervals representing the frequency
of the classes as follows:
Age of women at the time of marriage
40
35
30
No of women
25
20
15
10
0
12 17 22 27 32 37 42 47
Age
8. Ogive Curve (The Cummulative
Frequency Polygon)
• Some times it may be necessary to know the
number of items whose values are more or less
than a certain amount.
• We may, for example, be interested to know the
no. of patients whose weight is <50 Kg or >60 Kg.
• To get this information it is necessary to change
the form of the frequency distribution from a
„simple‟ to a „cumulative‟ distribution.
• Ogive curve turns a cumulative frequency
distribution in to graphs.
• Are much more common than frequency polygons
Cumulative Frequency and Cum. Rel. Freq. of Age
of 25 ICU Patients
Relative Cumulative Cumulative

Age Interval Frequency Frequency frequency Rel. Freq.
(%) (%)
10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100
Total 25 100
Cumulative frequency of 25 ICU patients
Example: Heart rate of patients admitted to hospital Y, 1998
Heart rate No. of patients Cumulative frequency Cumulative frequency

Less than Method(LM) More than Method(MM)
54.5-59.5 1 1 54
59.5-64.5 5 6 53
64.5-69.5 3 9 48
69.5-74.5 5 14 45
74.5-79.5 11 25 40
79.5-84.5 16 41 29
84.5-89.5 5 46 13
89.5-94.5 5 51 8
94.5-99.5 2 53 3
99.5-104.5 1 54 1
Heart rate of patients admited in hospital Y, 1998
60
50
Cum. freqency
40
30
20
10
104.5
54.5
59.5
64.5
69.5
74.5
79.5
84.5
89.5
94.5
99.5
Heart rate
LM MM
Percentiles (Quartiles)
• Suppose that 50% of a cohort survived at
least 4 years.
• This also means that 50% survived at most 4
years.
• We say 4 years is the median.
• The median is also called the 50th percentile
• We write: P50 = 4 years.
• Similarly we could speak of other percentiles:
– P0: The minimum
– P25: 25% of the sample values are less than or
equal to this value. 1st Quartile
. P25 means 25th percentile
– P50: 50% of the sample are less than or equal to

this value. 2nd Quartile
– P75: 75% of the sample values are less than or

equal to this value. 3rd Quartile
– P100: The maximum
It is possible to estimate the values of percentiles from
a cumulative frequency polygon.
9. Box and Whisker Plot
• It is another way to display information
when the objective is to illustrate certain
locations (skewness) in the distribution .
• Can be used to display a set of discrete or
continuous observations using a single
vertical axis – only certain summaries of
the data are shown
• First the percentiles (or quartiles) of the
data set must be defined
• A box is drawn with the top of the box at
the third quartile (75%) and the bottom at
the first quartile (25%).
• The location of the mid-point (50%) of the
distribution is indicated with a horizontal
line in the box.
• Finally, straight lines, or whiskers, are
drawn from the centre of the top of the box
to the largest observation and from the
centre of the bottom of the box to the
smallest observation.
• Percentile = p(n+1), p=the required percentile
• Arrange the numbers in ascending order
A. 1st quartile = 0.25 (n+1)th
B. 2nd quartile = 0.5 (n+1)th
C. 3rd quartile = 0.75 (n+1)th
D. 20th percentile = 0.2 (n+1)th
C. 15th percentile = 0.15 (n+1)th
• The pth percentile is a value that is p% of
the observations and  the remaining (1-
p)%.
• The pth percentile is:
– The observation corresponding to p(n+1)th if
p(n+1) is an integer
– The average of (k)th and (k+1)th observations if
p(n+1) is not an integer, where k is the largest
integer less than p(n+1).
• If p(n+1) = 3.6, the average of 3th and 4th
observations
• Given a sample of size n = 60, find the 10th
percentile of the data set.
p(n+1) = 0.10(60+1) = 6.1
= Average of 6th and 7th
– 10% of the observations are less than or
equal to this value and 90% of them are
greater than or equal to the value
How can the lower quartile, median and lower quartile
be used to judge the symmetry of a distribution?
1. If the distribution is symmetric, then the upper and

lower quartiles should be approximately equally
spaced from the median.
2. If the upper quartile is farther from the median than

the lower quartile, then the distribution is
positively skewed.
3. If the lower quartile is farther from the median than

the upper quartile, then the distribution is
negatively skewed.
Box plots are useful for comparing two or
more groups of observations
Outlying values
• The lines coming out of the box are called the
“whiskers”.
• The ends of the “whiskers’ are called “adjacent
values) [The largest and smallest non-outlying
values].
• Upper “adjacent value” = The largest value that is

less than or equal to P75 + 1.5*(P75 – P25).
• Lower “adjacent value” = The smallest value that is

greater than or equal to P25 – 1.5*(P75 – P25).
• The box plot is then completed:
– Draw a vertical bar from the upper quartile to the
largest non-outlining value in the sample
– Draw a vertical bar from the lower quartile to the
smallest non-outlying value in the sample
– Outliers are displayed as dots (or small circles) and
are defined by:
Values greater than 75th percentile + 1.5*IQR
Values smaller than 25th percentile − 1.5*IQR
– Any values that are outside the IQR but are not
outliers are marked by the whiskers on the plot.
– IQR = P75 – P25
• Number of cigarettes smoked per day was
measured just before each subject
attempted to quit smoking
10. Scatter plot
• Most studies in medicine involve measuring
more than one characteristic, and graphs
displaying the relationship between two
characteristics are common in literature.
• When both the variables are qualitative then
we can use a multiple bar graph.
• When one of the characteristics is qualitative
and the other is quantitative, the data can be
displayed in box and whisker plots.
• For two quantitative variables we use
bivariate plots (also called scatter plots
or scatter diagrams).
• In the study on percentage saturation of

bile, information was collected on the
age of each patient to see whether a
relationship existed between the two
measures.
• A scatter diagram is constructed by drawing X-and Y-axes.
• Each point represented by a point or dot() represents a pair of
values measured for a single study subject
Age and percentage saturation of bile for women patients in

hospital Z, 1998
160
140
120
Saturation of bile
100
80
60
40
20
0
0 10 20 30 40 50 60 70 80
Age
• The graph suggests the possibility of a
positive relationship between age and
percentage saturation of bile in women.
11. Line graph
• Useful for assessing the trend of particular situation overtime.
• Helps for monitoring the trend of epidemics.
• The time, in weeks, months or years, is marked along the
horizontal axis, and
• Values of the quantity being studied is marked on the vertical
axis.
• Values for each category are connected by continuous line.
• Sometimes two or more graphs are drawn on the same graph
taking the same scale so that the plotted graphs are
comparable.
No. of microscopically confirmed malaria cases by species
and month at Zeway malaria control unit, 2003
2100
No. of confirmed malaria cases
1800 Positive
1500 P. falciparum
P. vivax
1200
900
600
300
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
Line graph can be also used to depict the relationship
between two continuous variables like that of scatter
diagram.
• The following graph shows level of

zidovudine (AZT) in the blood of AIDS
patients at several times after
administration of the drug, for with normal
fat absorption and with fat mal absorption.
Response to administration of zidovudine in two groups of AIDS
patients in hospital X, 1999
8
7
Blood zidovudine
concentration
6
5
4
3
2
1
0
10
20
70
80
100
120
170
190
250
300
360
Time since administration (Min.)
Fat malabsorption Normal fat absorption

Exercise
• Evaluate the following graphs whether
they are good or bad and discuss the
points which make them good or bad
MMRatio per 100,000 live births by age of woman;
Giza, Egypt 1984
1200
1000
MMR per 100,000 LB
800
600
400
200
0
15-19 20-24 25-29 30-34 35-39 40-44 45-49
Age
MMR per 100,000 LB

1. The title of the graph tells the reader the
content of the graph. For example:
• the statistic presented (MMRatio);
• the second dimension of the graph (age
of woman on the x axis);
• the metric (per 100,000 live births);
• the source of the data (Giza, Egypt);
• The date (1984);
2. The Y axis is labeled (MMR per
100,000 LB);
3. The X axis is labeled (age of woman);
4. The legend is given (_______= MMR);
5. The source of the information is
provided (Kane et al)
Maternal Mortality:
Countries X, Y and Z since 1850
900 •
800
700
600
500
400
300
200
100
0
50
60
70
80
90
00
10
20
30
40
50
60
70
80
90
18
18
18
18
18
19
19
19
19
19
19
19
19
19
19
Sweden UK USA
• The Y axis is not labeled;
• The title does not give you the statistic
presented in the graph (Maternal Mortality is
not a statistic). This is particularly problematic
when the Y axis is also not labeled;
• Neither the title nor the Y axis identify the
metric (per 100,000 live births)
• The X axis is not labeled – but this is not so
serious when the categories are so obvious
and when the second dimension (year) has
been identified in the graph title.
14
12
10
Remember: 4
0
Antepartum Intrapartum Postpartum
Pre-eclampsia Eclampsia
A graph is a tool.
It is not an artwork to
hang above your sofa!
It is more important that it is
easy to correctly interpret
than it is that it is pretty!

2descriptive Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2descriptive Statistics

Uploaded by

Copyright:

Available Formats

Descriptive Statistics

• Not able to be measured as we measure

• The notion of magnitude is absent or implicit.

• Height, wt, # of children, etc.

• Has the notion of magnitude.

Nominal Ordinal Discrete Continuous

• Pain level: • The numbers have

• Frequency distribution: A table which

• The % of times that each value occurs, or

- Example1: The intensive care unit type of 25

Gender Frequency (n) Relative Frequency

• Cumulative relative frequency: The

• Mid-point: The value of the interval which lies

• Used for smoothening of the class intervals

• Subtract 0.5 from the lower and add it to the

• Pictorial representations of numerical data

1. Diagrams have greater attraction than

• When graphs are poorly designed, they not

Distribution of patients in hospital by source of referral

Child never smoked smoked occassionaly child smoked one/week or more

Figure 1. Source of information on the complications of FGM and participation in RH

• Change the frequency into percentage (P)

• Change the percentages into degrees,

• Draw a circle and divide it accordingly

Cause of death No. of death

• The area of each bar is proportional to the

Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49

100 Std. Dev = 6.13

 The other graphic display (stem-and-

1. Separate each data point into a stem and

3. Write the second stem (first stem +1) below the

Stem Leaf Number

100 Std. Dev = 6.13

Relative Cumulative Cumulative

Heart rate No. of patients Cumulative frequency Cumulative frequency

– P50: 50% of the sample are less than or equal to

– P75: 75% of the sample values are less than or

1. If the distribution is symmetric, then the upper and

2. If the upper quartile is farther from the median than

3. If the lower quartile is farther from the median than

• Upper “adjacent value” = The largest value that is

• Lower “adjacent value” = The smallest value that is

• In the study on percentage saturation of

Age and percentage saturation of bile for women patients in

• The following graph shows level of

Fat malabsorption Normal fat absorption

MMR per 100,000 LB

You might also like