You are on page 1of 46

Day 10: Statistical Analysis

SIOAPE KUPU
HEALTH RESEARCH OFFICER
Day 10 Objectives
Understand Basic Central Tendency
Descriptive Statistics Mean
Frequencies Median
Frequency Distributions Mode
How to construct a frequency Variability
distribution table Range
Proportions & Rates Standard Deviation
Variance
Descriptive Statistics
are methods of organizing, summarizing, and presenting data
in a convenient and informative way. These methods include:
 Numerical Techniques
 Graphical Techniques
The actual method used depends on what information we
would like to extract.

Descriptive Statistics helps to answer these questions…


Descriptive Statistics
1) Produce numbers that describe a set of
data
2) Common types of descriptive stats –
these are forms of Summarizing Data
Frequencies
Proportions and Rates
Central tendency
Variability
Frequencies
Simply Tallying and Counting the amount of times an event occurs.
Frequencies can be ungrouped or grouped
How many scores, observations, items or people fall within an
interval.
E.g. How many children in each age-group in a school?
1. 5 – 7 years of age = 15
2. 8 – 10 years of age = 16
3. 11 – 13 years of age = 13
The Frequency Distribution Table
The data in original form in which they are collected are
called raw data. They need to be organized in such a way
that is easily comprehensible.
A good way is to organize them by constructing a
frequency distribution. The following examples will
illustrate you the frequency distributions.
The Frequency Distribution Table
Frequency Distribution for Ungrouped Data
Data that list individual values or categories with
their frequencies are called ungrouped data.
The following example shows how the raw data
can be presented in an ungrouped frequency
table.
The Frequency Distribution Table
(Ungrouped)
Example 1: A tutor gave a test to her Evidence-Based
Class. Their marks were
3 8 6 5 6 4 7 6
5 3 5 6 3 5 4 4
3 6 7 8 1 10 7 6
4 5 0 7 6 5 6 7
1 7 5 4 5 8 5 7
The Frequency Distribution Table
The Frequency Distribution Table
You can see that it is easier to see the following types of information from
the frequency table than the raw data.
(a) The highest mark is 10 and the lowest mark is 0.
(b) 4 pupils scored more than 7.
(c) 12 pupils scored less than 5.
(d) 8 pupils scored 6.
(e) Nobody scored 9.
(f) 40 pupils did the test.
The Frequency Distribution Table
The Frequency Distribution Table

A table listing relative frequencies separately is called a


relative frequency distribution. If each relative frequency is
multiplied by 100%, we have a percentage distribution. The
relative frequency distribution of the Example 1 in this
section is given below:
The Frequency Distribution Table
We can see from this table
that relative frequency of
mark 4 is 0.125. It means
that 12.5% of pupils got 4
marks.
The Frequency Distribution Table
The Frequency Distribution Table
(Grouped)
The data presented in a frequency distribution table with class intervals are
called grouped data.
It is useful for large data sets because it makes the form or shape of the
distribution more obvious. A disadvantage, though, is that the scores lose
their individual identity.
Two basic guidelines that should be followed in constructing a grouped
frequency distribution are:
Each class should be of the same width.
The total range of scores is divided into mutually exclusive ( non -
overlapping ).
The Frequency Distribution Table
The Frequency Distribution Table
45 and 52 etc. are called class limits (45 is the lower class limit & 52 is the upper
class limit)
A completed frequency table provides a frequency distribution, i.e. it shows the
proportion of a population or sample having certain characteristics.
Relative frequency represents the percentage relative to total. It is used to make
comparison. Example: In the weight range of 85 - 92 kg, 3% of the non-smokers
and 0% of the smokers were represented.
Cumulative relative frequency (or cumulative percentage) gives that percentage
of individuals having a measurement less than or equal to the upper boundary of
the class interval. Example: From Table 3 and 4 we can see that 93% of the non-
smokers, compared to 89% of the smokers, are above 52.5 kg in weight.
Steps for Constructing a Grouped Frequency
Distribution
1. Calculate the range of the scores; Range = Highest – Lowest.
2. Determine the class interval width by using the following formula:
3. If this formula yields a decimal value, the value to the
4. nearest whole number.
5. List all of the class intervals, placing the interval containing the smallest value at the
top.
6. Tally the raw scores into appropriate class intervals.
7. Add the tallies for each interval to obtain the interval frequency. The tally for each class
8. interval, though, is not necessary to be shown.
Steps for Constructing a Grouped Frequency
Distribution (Example)
1) The overall weight interval is 47 kg to 91 kg.
2) Range = Maximum weight – Minimum weight = 91kg – 47kg = 44 kg
3) Suppose we want 6 intervals, therefore the size of the class width = 44/6 = 7.3333 or 8.
4) Select a starting point for the lower class limit. This can be the smallest data value or any
convenient number less than the smallest data value. In this case, we will use 45. Add the
width to the lowest score taken as the starting point of the next class. Keep adding until
there are 6 classes. You should get 45, 53, 61, etc.
5) Subtract one unit from the lower limit of the second class to get the upper limit of the
first class. Then add the width to each upper limit to get all the upper limits. 53 – 1 = 52.
That is, the first class is 45 – 52. The second class is 53 – 60, etc.
Grouped Frequency Distribution
Frequency Distribution “ Normal Distribution”
Many types of information,
and particularly
characteristics in the
‘natural’ world, have a bell-
shaped (‘Normal’) curve
–Particularly as we get
larger samples
Proportions and Rates
A proportion is just the frequency of something in the data
(numerator) divided by a denominator (usually the total sample
size or in some cases the population) can be considered Relative
Frequencies
Population rates are often age standardised, or adjusted in
some other way to make different populations comparable
Proportions are often reported as percentages but in cases
where the proportion is relatively small they may be reported in
rates per 10,000 or 100,000 (or more).
Rates are proportions over a specific timeframe.
Central Tendency
We often want to know what the typical subject
got for a particular variable. The mean, the
median and the mode are known as measures
of central tendency - the tendency of a set of
data to centre or cluster around certain
numerical values.
Central Tendency
Mode - the number or range of numbers in a set that occurs
the most frequently
Median - the middle value in a distribution, above and below
which lie an equal number of values
Mean (arithmetic) - The value obtained by dividing the sum of
a set of quantities by the number of quantities in the set
If the distribution is perfectly ‘normal’ then the mode, median
and mean will be the same.
Central Tendency (The Mean)
The arithmetic mean (or, simply, mean) is computed by summing all
the observations in the sample or the population, and dividing by the
number of observations. Basically, the mean is the average
proportion.
Symbolically, the sample mean is represented by:
Properties of The Mean
The arithmetic mean possesses certain properties, some desirable and
some not so desirable. These properties include:
Uniqueness. For any given set of data there is one and only one
arithmetic mean.
Simplicity. The arithmetic mean is easily understood and easy to
compute.
Since each and every value in a given set enters into the computation of
the mean, it is affected by each value. Extreme values (Outlier), therefore,
have an influence on the mean and, in some cases, can so distort it that it
becomes undesirable as a measure of central tendency.
Properties of The Mean
Calculating the Mean
Central Tendency (The Median)
The median is the halfway point in the data set.
Before one can find this point, the data must be
arranged in order (i.e. ascending or descending).
The median either will be a specific value in the
data set or will fall between two values. The
median is affected less than the mean by
extremely high or low values.
Properties of The Median
Properties of the median include the following:
Uniqueness. As is true with mean, there is only
one median for a given data set.
Simplicity. The median is easy to calculate.
It is not as drastically affected by extreme values
as is the mean.
Calculating the Median
Calculating the Median
Central Tendency (The Mode)
The mode is the observation that occurs
most frequently. The mode can be used
when the data are nominal. The mode is not
always unique. A data set can have more
than one mode, or the mode may not exist
for a data set.
Properties of The Mode
The mode is used when most typical case is
desired.
The mode is the easiest average to compute.
The mode can be used when the data is nominal.
The mode is not always unique. A data set can
have more than one mode or none at all.
Calculating the Mode
Variability
From the measure of central tendency, it is not possible
to know whether the observations tend to be quite similar
(homogeneous) and therefore, lie close the centre or
whether they are considerably spread out (heterogeneous)
across a broad range of values.
The most common of these are the range, variance and
the standard deviation.
Variability (Range)
The range is defined as the difference in value
between the highest (maximum) and lowest
(minimum) observation:

The range, though can be calculated quickly is not


very useful, because it only considers the extremes
and not the bulk of the observations.
Variability (Variance and SD)
Variances and standard deviations can be used to
determine the spread of the data.
Variance is a measure of the dispersion shown by
a set of observations.
Standard deviation is a statistical summary of how
dispersed the values of a variable are around its
mean.
Variability
If the variance or standard deviation is large, the
data are more dispersed.
This information is useful in comparing two (or
more) data sets to determine which is more (most)
variable. It also used to determine the consistency
of a variable (Reliability)
Microsoft Excel
Your new best friend in analysing data.
Input your quantitative datasets into Microsoft
Excel.
Open-ended questions can be categorized into
Excel if needed.
Use Data Analysis on the Data Tab in Microsoft
Excel to identify the Basic Descriptive Statistics
The Average Age at First Marriage - Females
Country Level Units As of
Australia 29.70 Years Average 2006
New Zealand 25.60 Years Average 2006
Fiji 22.90 Years Average 1996
Solomon Is. 20.90 Years Average 2007
Vanuatu 22.60 Years Average 2006
Kiribati 20.90 Years Average 2009
Samoa 24.00 Years Average 2009
Tonga 25.60 Years Average 2006

https://www.quandl.com/collections/society/age-at-first-marriage-female-by-country
The Average Age at First Marriage - Males
Country Level Units As of
Australia 31.60 Years Average 2006
New Zealand 27.00 Years Average 2006
Fiji 26.12 Years Average 1996
Solomon Is. - - -
Vanuatu 25.30 Years Average 2002
Kiribati - - -
Samoa 28.70 Years Average 2009
Tonga 28.00 Years Average 2006

https://www.quandl.com/collections/society/age-at-first-marriage-female-by-country
The Average Age at First Marriage – Females vs. Males
Females Males
Country Level Level
Australia 29.70 31.60
New Zealand 25.60 27.00
Fiji 22.90 26.12
Solomon Is. 20.90 -
Vanuatu 22.60 25.30
Kiribati 20.90 -
Samoa 24.00 28.70
Tonga 25.60 28.00
Day 10 Objectives
Understand Basic Central Tendency
Descriptive Statistics Mean
Frequencies Median
Frequency Distributions Mode
How to construct a frequency Variability
distribution table Range
Proportions & Rates Standard Deviation
Variance
Thank You! 

Day 10: Statistical Analysis


SIOAPE KUPU
HEALTH RESEARCH OFFICER

You might also like