Professional Documents
Culture Documents
• What is data?
2
What is data
4
Class Discussion
• Timothy, a 25 year old man, Fijian of other descent presents at your clinic with a cough and
respiratory tract infection. The nurse had performed preliminary vitals sign tests and the
slip he brought in indicated he was 1.5m tall and weighed 95 kg. His blood pressure was
175/110.
• Further clinical history:
• Patient lives at the family home with his parents, brother and family, and sister and
family. His siblings have a total of 8 children (5 boys and 3 girls) all living within the
extended family setting.
• Timothy reported that he had developed a cough after one of the children caught a flu-like
illness from school two months ago. Several members of his family were ill and recovered,
but he continued to suffer with a recurrent cough and sore throat which had recently
progressed to shortness of breath.
• He works from home for a call centre and enjoys spending his free time on his social media
pages as a local ‘foodie’ influencer. He has been smoking since he was 18 and consumes up
to 2 packets of cigarettes daily.
His Mother is diabetic and his father was recently diagnosed with hypertension.
5
•
E.g., What do we know about our patient? Can you pick out
the different types of data?
• Qualitative • Quantitative
6
Why is it important to know your data?
• Allows you to think of the best methods of data collection to capture these
i.e. Is it best to utilize quantitative or qualitative methodologies or perhaps one
with mixed methods
• Both forms can be used to provide a basic descriptive account of an idea, study
or.
8
Explanatory and response variables
• A variable in a study may be defined as:
• 1. An outcome (or response or dependent) variable. This is the variable that is the main
interest in our study. In the example above, CD4 count is considered the outcome of
interest.
• 2. An explanatory (or exposure or independent) variable. This is a variable that changes
the value of our outcome. In the example above, HIV type is considered the explanatory
variable.
• The distinction between explanatory and outcome variables is dependent on the context
and the objectives of the question being answered. E.g., The importance of being able to
identify the outcome of interest in a study, and how to design the study to be able to
answer the question about that outcome. In randomised controlled trials, it is often a
drug or treatment that is the main explanatory variable. In most studies there is more
than one explanatory variable that can influence the response or outcome of interest.
When we have more than one explanatory variable we call the other variables
covariates.
9
What are its Forms?
Data
Qualitative Quantitative
(Categorical) (Numerical)
Continuous
Nominal Ordinal Discrete
Binary (Any value including
(naming) (order) (Whole)
fractions)
Examples:
Name Examples: Examples:
Satisfied, Neither satisfied Examples: Examples:
Gender Yes/ No
10
nor Dissatisfied, 0, 1 2, 3… 2 ½ , 3.714,
Ethnicity Dissatisfied True/ False
How do we make
sense of data?
DESCRIPTIVE STATISTICS
11
Descriptive Statistics
Range
Interquartile range
Mode
Variance
Median
Standard Deviation
12
Mean
Data
• RAW DATA: information in its unorganised form e.g., Collection of MBBS 3
class data for Age, Gender, and favourite colour
13
How do we make
sense of data?
DISTRIBUTION
14
DISTRIBUTION
• Distribution refers to the frequencies of different responses.
15
Describing your data - displaying
categorical variables
• Before we begin to answer any question of interest from
our study we need to summarise and display our data to
get some idea of what it is telling us. For example, we can
look at how the values of variables change from subject to
subject i.e. what is the distribution of values taken by a
single variable or the association between two variables.
We can summarise data through tables or graphs. These
presentations are purely descriptive with each having
advantages and disadvantages. Careful consideration
must be given to what you would like to show.
16
Presentation of categorical of data
using tables
• Tables (and diagrams) should be well labelled and self-explanatory; you
should be able to obtain all the information you require from the table
without any text to describe it. To ensure this the title should be
informative; the outcome and explanatory variables should be 3 clear;
the percentages should be clearly derived; and provide footnotes for
missing values and abbreviations. However, they must not be cluttered
with too much information.
• Summarising categorical variables is straightforward. For each
category of a variable the number of subjects is counted. These counts
are known as frequencies. One-way tables show the frequency of
categories (or values) of each variable.
17
18
19
Example 2 continued
• In this example, the exposure is BCG status and is presented as the row
variable and therefore row percentages are appropriate.
--------------------------------------------------------------------
20
Presentation of categorical data
using graphs
• Graphical representation can often be used to show the same information as
a table but in a more vivid manner. Graphs are particularly useful for
presentations and talks. Frequencies are often illustrated in two forms:
▪ Pie charts - In a pie chart the frequencies or the percentages are represented
by the angles in different sectors (slices) of a circle; the total (360 degrees) is
equal to 100%, as shown in Figure 1
21
22
23
Describing your data - displaying
quantitative variables
• The frequencies with which different possible values of a
quantitative variable occur may be summarised as a frequency
distribution. The frequency distribution of individual values is
seldom helpful, unless the overall number of observations is quite
small. It is more useful to group the values taken by the variable and
to report the numbers and the frequencies (or percentage
frequencies) of subjects in each group.
• The first step when forming a frequency distribution is to identify
the lowest and highest values. Then the number and size of the
groups is determined. The number of groups will depend on the
observations; if the number of groups is too few (width of the groups
is large) too much information will be lost, while too many groups
(width of the groups is small) may be impractical. Where possible
each group should be the same width and the starting points should
be whole numbers with no gaps.
24
• Example 2: A study on HIV infected
patients presenting at a hospital in the
Gambia. A total of 1084 patients have been
classified by CD4 cell count (cells/µl) and
HIV-type, at the first presentation to
hospital.
25
26
27
Presentation of quantitative data
using graphs
• Graphical presentation of quantitative variables can take three forms:
▪ Histograms - A histogram is similar to a bar chart, with (usually) the values of the variable
grouped into several categories and the bars can then represent the frequencies. The bars touch one
another to indicate the continuous nature of the variable. If the widths of the groups are different
this should be reflected in the histogram through the thickness of the bars; the area of the bar
should be proportional to the frequency. A histogram can be used to illustrate the distribution of a
single quantitative variable or the distribution of a quantitative variable across the levels of a
categorical variable.
▪ Cumulative frequency curves - The cumulative frequency is the number of data less than (or equal
to) a particular value.
▪ Scatter plots – A simple graph used to examine the relationship between two quantitative
variables. Each pair of values is represented by a symbol where the horizontal position is
determined by the value of the first variable (exposure) and the vertical position by the value of the
second variable (outcome).
These initial displays of the data are particularly useful for identifying outliers or unusual values,
and revealing possible errors.
28
A histogram of the haemoglobin (hb) values in the 70 women is given in Figure 3. In a
histogram, it is the area of the rectangle which represents the frequency (or percentage) -
the vertical scale is measured in frequency per unit of value and the horizontal scale is
measured in unit values. Note that the rectangles are drawn from 8 up to 9, 9 up to 10 etc,
not from 8 up to 8.9, 9 up to 9.9 etc., which would correspond to the actual range of recorded
values.
29
30
31
32
33
34