You are on page 1of 39

Data Presentation

Dr Helmy Hazmi
Various types of data presentation
Textual
Tabular
Graphical display
Textual presentation
Data are presented in the form of texts, phrases
and paragraphs
Only salient and important findings are reported
Eg: Newspapers
Numerical Categorical
Mean (SD) Frequency
Median (IQR) Relative frequency
Mode Cumulative relative frequency
Variance, Range
Tabular form

Histogram Bar
Frequency polygon Pie
Frequency curve Pictogram
Stem and leaf plots Map
Graphical form Box and whisker plot
Line chart cumulative
Scatter diagram
Tabular display
You are concerned with presentation table.
Effectively minimize number of data values in
text.
Eliminate discussion of less significant variables
in the story line.
Help your reader to absorb information better
Data in the table should be
Concise
Support accompanying analysis or text
Able to stand alone
Clear title Number of samples Title for
mentioned table is at
the top
Table 1. Characteristics of study subjects (n=44)

Variables Mean (SD) Median (IQR) Frequency (%)

Age (years) 46.0 (14.49)

BMI (kg/m2) 24.7 (7.39)a

Sex
Male 28 (63.6)
Female 16 (36.4)
SD = standard deviation, IQR = Inter Quartile Range, a skewed to the right

For SD, use 2-3


Unit of Foot note to For
decimals,
numerical provide percentage,
usually a
variables extra use one
decimal more
mentioned information decimal
than its mean
Other considerations
Portrait or landscape? Depends on your intention
Subtle lines or shadings encourage readers to read
horizontally of vertically
Columns should be evenly spaced and not too far apart
Avoid unnecessary text
Display data by chronological orders r by using some
standard classifications.
Do not leave data cells empty. Missing values should be
identified as “not available” or “not applicable”. If “NA” is
used, it needs to be defined.
Good example Bad example

1 320 000 1320000

1 670 000 1670000

93.2 93.2

1045.0 1045.0

385.6 385.6
Infant mortality rate – per 1000 live births
1980 1985 1990 2000 1995 2002 2003
State A 23.81 43.56 23.45 30.67 67.43 12.1 11.3
State B 56.77 55.42 n/a 23.81 43.56 23.45 10.2
State C 34.2 22.45 32.4 45.56 12.43 43.21 24.12
State
D 45.56 12.43 43.56 n/a 12.1 32.4 45.56

What is the problem with the table above?


How can you improve it to make it better?
Graphical display
Use statistical graph
Effective
Attention grabber
Reveal trends and relationship
over time
comparison
frequency distribution
correlation
relative share of the whole
What is the
problem with this
graph?
When not to use Graphical display
Data are very dispersed
Have to few values
Have too many values
Show very little or no variation
Histogram

To show frequency distribution


Data are grouped into defined bins or class intervals
No space between columns
Label can be a single value or a range of value
Height indicates frequency and size of group under each column label
Columns cannot be rearranged
Histogram
Frequency Polygon

Another method to show frequency distribution


The frequency corresponds to the middle of each class interval
Include one class interval lower than the lowest data value or higher
than the highest data value – to allow polygon to touch the x – axis.
Frequency Curve

A smoothen frequency polygon.


An effect seen when the number of observation becomes infinitely large
and widths infinitely smaller.
Corresponds tot the limit shape of frequency polygon
Area under the curve bounded by the class interval corresponds
proportionally to the area of each class interval column.
BOX and WHISKER Plot

Makes use of the


quartiles of data set

Reveals:
Amount of spread
Location of
concentration
Symmetry of the data
WHAT CAN WE SAY ABOUT
THIS BOX PLOT DIAGRAM?

The approximate values of


the 1st and 3rd quartiles are
250 and 300 respectively.

The maximum and


minimum values are
around 335 and 237
respectively

The median value is 275.

50% of the measurement


are between 250 and 300.
More observations More observations
concentrated at the concentrated at the
left end of the scale right end of the scale

Less observations at Less observations at


the right end of the the left end of the
distribution distribution
Comment on the difference between
1. (1) and (2)
2. (2) and (3)
3. (1) and (4)
Resembles a histogram

Show distribution of data

Reveals:
Distribution of data set – symmetry?
Location of concentration
Range of data set

Preserves information of individual


measurement compared to
histogram

Effective in small data sets

Not suitable for annual reports or


communicating data with the public
This means
68

Number of
observations on This
the line and means 136
above and/or
below

leaf

stem
First step in studying
relationship
between 2 variables

Independent
variables : X axis

What can we
conclude from this
Dependent Relationship between diagram?
variables: Y axis 2 continuous variables
But, what is the
potential problem
with this plot?
Categorical data presentation
Compare frequencies or values between different categories or
groups.

Can be used in discrete datas.

Other types: stacked bar, grouped bar, deviation bar chart


Stacked bar chart
Deviation bar chart
Grouped bar chart
Can be vertical or horizontal

Bars much wider than the gaps between them

Bar chart Bars must be of the same width

No more than 5 bars within a group if possible

Arranged bars in an ascending or descending


order to create a a decreasing or increasing bar
lengths.

In group bar chart, leave a space between group


of bars but not between bars in a group.
For categorical data

To show proportion or relative frequency of a


category from a WHOLE

To show the relative importance of one category


in the total

TIPS
Not more than six categories
Arrange segments from smallest to largest
Begin segment at 12 o’clock
Keep it as simple as possible – use 3D sparingly
Figures and labels can be placed in the segments
– legends not necessary

Difficulty:
Comparing data across different pie chart
Which one is clearer?
GOOD examples
PICTOGRAM

Can be used for academic


purposes

But very rare

Useful in conveying message to


the public
Most appropriate chart for
time series

Proper parameters to avoid


distorting data
Adjusting the parameters to avoid a false impression
Find the appropriate graph for the types of data

You might also like