You are on page 1of 63

Kwame Nkrumah University 1-1

of Science and Technology

NUT 561
Statistics Methods and Biostatistics

Dr. Emmanuel de-Graft Johnson Owusu-Ansah


Edjoa.knust@gmail.com
0244378150

1
1-2
Course Description
The course focuses on providing practical knowledge on
statistical approaches in qualitative and quantitative
research.

It focuses on scientific methods of research in health


systems, including interpretation of research data with
statistical analysis such as SPSS, STATA and Graph
Prism.
1-3
Course Content
Introduction to Statistics for Food Science; Descriptive
statistics; Assessment of food, nutrient and energy intake;
Multiple Comparison procedure (ANOVA, MANOVA);
Linear regression analysis; Logistic regression analysis for
cohort and case-control studies; Analysis of Intervention
studies, Epidemiological studies, and clinical trial
1-4
Learning Objectives
The course will equip doctoral students with necessary
knowledge and skills in application of existing statistical
methods to nutrition and public health research.
1-5
Mode of Delivery and Assessment
1. Mode of Delivery
a) Lecture/Lab
b) Group discussions/ videos
c) Individual/Group Activities

2. Assessment
a) End of semester exam 60%
b) Mid-semester exams 20%
c) Class Exercises/Assignment 20%
1-6
Reading Materials
1. Mahajan, B.K. (2015). Methods in Biostatistics for Medical
and Research worker 7th Ed. India

2. Hugo, P.A. (2013). Statistics in Food Science and Nutrition.


Springer, USA

3. Wasserman, L.A. (2013). All of statistics: A concise course in


Statistical Inference, Springer Science & Business Media,
Germany

4. Rao, K.V. (2009). Biostatistics: A manual of Statistical


methods for use in Health, Nutrition and Anthropology 2 nd Ed.
Jaypee Brothers Medical Publishers, India.

5. Newman, S.C. (2001). Biostatistical for Methods in


Epidemiology, John Wiley and Sons, Inc., USA
1-7

Concept of Statistics
1-8
What is statistics

Statistics is the science


of collecting, organizing,
presenting, analyzing,
and interpreting
numerical data to assist
in making more
effective decisions.

  
 
1-9
Branches of Statistics
 The concept of statistics can be categorized into two:
 Descriptive Statistics
 Inferential Statistics
Descriptive Statistics:
Statistics
consist of the collection,
organization, summarisation,
and presentation of data in
Inferential Statistics:
Statistics an informative way.
consist of generalization of
the results from sample to
population. This includes
estimation, hypothesis
testing, prediction and
determination of
relationships among
variables.
1-10
Definition

A Population consist of all possible individuals, objects, or


measurements of interest.

A Sample is a portion, or part, of the population of interest

A variable is a characteristic or attribute that can assume different


values

Data are the values (measurements or observations) that the


variable can assume.

Data set is a collection of data values. Each value in a data set is


called a data value or datum.
1-11
Variable
A variable is a characteristic or attribute measured on a sample or
population elements and can assume different values.

There are two types of variables


 Quantitative variable

 Qualitative Variable

Quantitative variables: are variables that assume numeric values.


Example: weight of a new born baby,

 height of a patient

Qualitative variables: are variables that assume non-numeric values.


Example: gender of a new born baby,

 baby feeding type (breast feed or bottle feed)


1-12
Variable
The quantitative variable can further be classified into two groups:
 Discrete variable

 Continuous variable

Discrete variables: are variables that assume values that can be


counted. They are obtained by counting. Example:
 number of new born babies in incubator

 number of patients who lost weight after diet programme

Continuous variables: are variables that assume all values


between any two specific values. They are obtained by measuring.
Example:
 weight of a new born baby

 height of a patient taking


1-13
Scale of Measurement
Data can also be classified according to how they are categorised,
counted or measured

There are four scales of measuring data:


 Nominal scale

 Ordinal Scale

 Interval Scale

 Ratio scale
1-14

1 1 March 202
4 Different Level
4 of Measurement
1-15
Scale of Measurement

Nominal Scale:
In this scale data are classified into mutually exclusive categories
and can not be arranged in a certain order. This type of data are
non-numeric.

Example:
Gender: male or female

Marital status: single, married, divorced

Employment status: employed, unemployed

Religious affiliation
1-16
Scale of Measurement

Ordinal Scale:
In this scale data are classified into categories that can be ranked or
ordered; however the precise differences between the ranks does not
exist.

Example:
Severity of injury: fatal, serious, minor, no injury

Taste of food: Good, moderate, bad

Socio-economic status: High, middle, low


1-17
Scale of Measurement

Interval Scale:
In this scale, data can be ranked and the precise differences between
the ranks does exist; however, there is no meaningful zero

Example:
Temperature: the difference between temperature of 10 0 and 500

indicates that one of warmer than the other. However,


temperature of zero does not means that there is no temperature

IQ of a person: the difference between IQ of 50 and 110


indicates that one is more intelligent than the other. However, IQ
of zero does not means that the person has no intelligence.
1-18
Scale of Measurement

Ratio Scale:
The ratio scale of measurement possesses all the characteristics of
interval scale and there is meaningful zero.

Example:
weight of a new born baby

height of a patient taking

Distance between two locations


1-19
2- 20

Descriptive Analysis of Data


2- 21
Descriptive Analysis of Data

The analysis of data using descriptive methods involves:


 Tabular presentation of data
 Numerical Summary
 Graphical presentation of
2- 22
Raw Data

When data are collected in original form, they are


called raw data. Example: number of patients
admitted for mulnitrition in different hospitals

1 2 6 7 12 13 2 6 9
5 18 7 3 15 15 4 17 1
14 5 4 16 4 5 8 6 5
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
2- 23
Tabular Presentation of Data
A Frequency Distribution Table is an organization
of data into mutually exclusive categories showing the
number of observations in each class.
2- 24
Tabular Presentation of Data

Frequency distribution showing the level of


education of patients on a diet programme in
a given hospital

Education Frequency
No school 13
Primary 15
Secondary 12
Tertiary 14
2- 25
Tabular Presentation of Data

Cross table showing the level of education of


patients on diet programme classified by gender

Gender
Education Male Female
No school 5 8
Primary 7 8
Secondary 8 4
Tertiary 11 3
2- 26
Graphical Presentation of Data

Graphs for Qualitative Data


Bar Chart
Pie Chart
Multiple Bar Chart

Graphs for Continuous Data


Histogram
Boxplot
2- 27
Graphical Presentation of Qualitative Data
Bar Chart: consist of a vertical bars or rectangles placed
along the category axis. The height of the bar represents
the frequency corresponding to the category.

Example: Distribution of customer satisfaction after a dieting programme


2- 28
Pie Chart
Pie Chart: consist of area of a circle divided into
sectors such that each sector is proportional to a
category it represent.
Example: Distribution of customer satisfaction after a dieting programme
2- 29
Multiple Bar Chart
Multiple Bar Chart: This type of chart is used to present
information for two categories at the same time. It is similar to
the bar chart except that the component in each category is
represented by bars of for different components.
Example: Distribution of customers satisfaction after diet programme classified by
gender
2- 30
Graphical Presentation of Quantitative Data

Histogram: is a graph that displays the using vertical bars of


various height to represent the frequencies of the classes
Example: Distribution of calcium contained in a sample of meat bread produced
by a factory for school children
2- 31
Boxplot
Boxplot: is a line graph that represent data that occur
over a specific period of time.
Example: Distribution of number of natural or restored teeth present aged classified
by nutrition status
2- 3-
3232

Numerical Summary
2- 33
Raw data
The data represent the weight (kg) of 50 obese adults in
Ghana.

112, 100, 127, 120, 134, 118, 105, 110, 109, 112,
110, 118, 117, 116, 118, 112, 114, 114, 105, 109,
107, 112, 114, 115, 118, 117, 118, 122, 106, 110,
116, 108, 110, 121, 113, 120, 119, 111, 104, 111,
120, 113, 120, 117, 105, 110, 118, 112, 114, 114.

What can you say about this data?


2- 34
Measures of Central Tendency

Measures of Central Tendency: These measures, often


referred to as averages, describe the centre of any given
data set. They are useful measure to summarize a given
data set.

Measures of Central Tendency


1.The Arithmetic Mean
2.The Median
3.The Mode
2- 35
The Arithmetic Mean

The sample mean is the sum of all the


sample values divided by the number of
sample values:

X
X 
n

where n is the total number of


values in the sample.
2- 36
The Arithmetic Mean: Example

The number of malnourished 14,


children admitted in five different 15,
hospitals 17,
16,
15

X 14  15  17  16  15.0
X 
n 5
77
  15.4
5
2- 37
The Arithmetic Mean

Characteristics of the Arithmetic Mean


The mean is unique for any set of numerical data.
All the values are included in computing the mean.
It is applicable to quantitative data only
The mean is affected by extremely large or small
data values.
2- 38
The Median

The Median is the


midpoint of the values after
they have been ordered from
the smallest to the largest. There are as many
values above the
median as below it in
the data array.

For an even set of values, the median will be the


arithmetic average of the two middle numbers.
2- 39
The Median: Example 3- 39

The number of malnourished new born babies in


five intensive care units:
21, 25, 19, 20, 22.
Calculate the median

Arranging the data in


ascending order gives:

19, 20, 21, 22, 25.

Thus the median is 21.


2- 40
The Median

Characteristics of the Median

 There is a unique median for each data set.


 It is not affected by extremely large or small
values and is therefore a valuable measure of
location when such values occur.
 It can be computed for ratio-level, interval-
level, and ordinal-level data.
 50% of the observations lies above the median
and 50% falls below it.
2- 41
The Mode 3- 41

The Mode is another measure of location and


represents the value of the observation that appears
most frequently.

Example 6: The number of days spent in treating


malnourished child to gain weight in a hospital are: 81,
93, 84, 75, 68, 87, 81, 75, 81, 87.
Find the mode

Because 81 occurs the most often, it is the mode.


2- 42
The Mode

Characteristics of the Mode

 Data can have more than one mode. If it has two


modes, it is referred to as bimodal.
 The mode is not influence by extreme values
 The mode can be found for both quantitative and
qualitative data.
2- 43
The Measures of Dispersion
Measures of Dispersion: These measures, describe the
spread or variability in a data set.

range,
Measures of dispersion include the following:
variance, standard deviation and coefficient
of variation.
2- 44
The Range 3- 44

Range = Largest value – Smallest value


The following represents the temperature (degree Celsius)
recorded for 10 new born babies are:

34.5, 35.2, 36.5, 36.1, 35.5, 36.0, 34.7, 32.1, 33.9, 36.5

Highest value: 36.5 Lowest value: 32.1


Range = Highest value – lowest value
= 36.5-(32.1)
= 4.4
2- 45
The Variance and Standard Deviation 3- 45

Variance: the
arithmetic mean
of the squared
deviations from
the mean.

Standard deviation:
deviation The square
root of the variance.
2- 46
The Variance and Standard Deviation

variance (s2)

(X - X ) 2
s2 = n -1

Sample standard deviation (s)

s s 2
2- 47
The Variance and Standard Deviation: Example

The number of malnourished pregnant mothers admitted into


intensive care unit at five different hospitals are:
7, 5, 11, 8, 6.
Find the sample variance and standard deviation.

X 37
X   7.40
n 5
X  X  7  7.4  ...  6  7.4
2 2 2
s2  
n 1 5 1
21.2
  5.30
5 1

s s2  5.30  2.30
2- 48
The coefficient of variation
The coefficient of variation (cv) is defined as the ratio of
the standard deviation to the arithmetic mean. This is
usually expressed in percentage.

• Measures relative variation


• Shows variation relative to mean
• Can be used to compare two or more sets
of data measured in different units
2- 49
The coefficient of variation
• Male:
– Average weight of newborn baby last year = 5kg
– Standard deviation =2kg
 S  2
C VA     100%   100%  40%

 X  5

• Female:
– Average weight of newborn baby last year = 6kg
– Standard deviation = 2kg

 S  2
C VB     100%   100%  33.3%

 X  6

Both male and female children have the same standard deviation, but Females
weight are less variable relative to its mean
2- 50
The Measures of Position
Measures of Position: These measures describe the location
(position) of a particular value in a given distribution of data.
The position is described by quatiles

• Quartiles split the ranked data into 4 segments with an equal


number of values per segment

25% 25% 25% 25%

Q1 Q2 Q3
 The first quartile, Q1, is the value for which 25% of the observations
are smaller and 75% are larger
 Q2 is the same as the median (50% are smaller, 50% are larger)
 Only 25% of the observations are greater than the third quartile
2- 51
The Quartiles

The quartiles are found by determining the value in the appropriate


position in the ranked data, where

First quartile position: Q1 = (n+1)/4

Second quartile position: Q2 = (n+1)/2 (the median position)

Third quartile position: Q3 = 3(n+1)/4

where n is the number of observed values


2- 52
The Quartiles: Example
Find the first quartile

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Q1 and Q3 are measures of noncentral location


Q2 = median, a measure of central tendency
2- 53
The Quartiles: Example
Find the other quartiles

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5

Q2 is in the (9+1)/2 = 5th position of the ranked data,


so Q2 = median = 16

Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,


so Q3 = 19.5
2- 54
The Measure of Skewness

Measures of Skewness: These describe the shape of the


distribution of a given data set.

Symmetrical distribution
Positively Skewed distribution
Negatively Skewed distribution

Coefficient of skewness: 3(mean  median)


 
s tan dard deviation

3  3
2- 55
The Measure of Skewness
3- 55

Symmetric distribution: A distribution having the


same shape on either side of the center

Skewed distribution: One whose shapes on either


side of the center differ; a nonsymmetrical distribution.

Can be negatively or positively skewed


2- 56
The Measure of Skewness 3- 56

Zero skewness Mean


=Median
=Mode

M ea n
M ed ia n
M ode

The Relative Positions of the Mean, Median, and Mode:


Symmetric Distribution
2- 57
The Measure of Skewness 3- 57

Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode
M ea n M ode
M ed ia n

The Relative Positions of the Mean, Median, and


Mode: Left Skewed Distribution
2- 58
The Measure of Skewness 3- 58

• Positively skewed: Mean and median are to the right of the mode.

Mean>Median>Mode

M ode M ea n
M ed ia n

The Relative Positions of the Mean, Median, and Mode:


Right Skewed Distribution
2- 59
Exploratory Data Analysis (EDA)

Box-and-Whisker Plot:
A Graphical display of data using 5-number
summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum

Example:

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum


Quartile Quartile
2- 60
Box-and-whisker plot

• The Box and central line are centered between the


endpoints if data are symmetric around the median

Min Q1 Median Q3 Max

• A Box-and-Whisker plot can be shown in either vertical


or horizontal format
2- 61
Box-and-whisker plot: Example
• Below is a Box-and-Whisker plot for the following
data:
Min Q1 Q2 Q3 Max

0 2 2 2 3 3 4 5 5 10 27

0 2 3 5 27

• The data are right skewed, as the plot depicts


2- 62
Distribution Shape and Box-and-whisker plot

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
2- 63

Introduction to SPSP

You might also like