You are on page 1of 132

SPSS ‫اﻟﺗﺣﻠﯾل اﻻﺣﺻﺎﺋﻲ ﺑﺎﺳﺗﺧدام‬

Biostatistics and Data Handling 2020


By
Abdelghafar M. Abu-Elsaoud
Associate professor of biostatistics and Physiology

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 1


SPSS ‫اﻟﺗﺣﻠﯾل اﻻﺣﺻﺎﺋﻲ ﺑﺎﺳﺗﺧدام‬
Biostatistics and Data Handling 2020
By
Abdelghafar M. Abu-Elsaoud
Associate professor of biostatistics and Physiology
‫ﻣ د ر ب ﻣ ﻌ ﺗ ﻣ د ﺑ ﻣ ر ﻛ ز ا ﻟ ﺗ ﻧ ﻣ ﯾ ﺔ ا ﻟ ﺑ ﺷ ر ﯾ ﺔ ﺟ ﺎ ﻣ ﻌ ﺔ ﻗﻧ ﺎ ة ا ﻟ ﺳ و ﯾ س‬
‫ ﺟ ﺎ ﻣ ﻌ ﺔ ﻗﻧ ﺎ ة ا ﻟ ﺳ و ﯾ س‬- ‫ﻣ د ﯾر ﻣ ر ﻛ ز ا ﻟ ﻧ ﺷ ر ا ﻟ د و ﻟ ﻲ‬
EasyStat
YouTube Channel
https://www.youtube.com/channel/UC5L-iT8tjbDMz3ObOoAWoeQ 2
Data Handling; A.prof. Abdelghafar Abu-Elsaoud
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 4
Data handling and statistical analyses Course content
Biostatistics/ Medical Statistics Applications
q Important definitions in Biostatistics o Microsoft Excel,
q Collection of data o open office
q Type of statistical analyses (Descriptive/inferential) spreadsheet
q Types of data o SPSS, PSPP, and
Minitab
q Population and samples and sampling
q Element, Data, variable o Introduction on
installed apps
q Sources of data EasyStat o How to import/
q Variables/Types of variables handle data
q Normal distribution / testing normality of data
• (Shapiro-Wilk, Kolmogorov Smirnov, Q-Q plots, Histograms)
q Descriptive statistics
• Central tendency measures (Mean, Median, Mode)
• Variability measures (Variance, SD, SE, C.V., ..etc.)
q Presentation of data (line, bar, frequency, histogram, boxplot)
q Validity and reliability
q
q
Outlier’s Detection
Inferential statistics
EasyStat
YouTube Channel
q Hypothesis testing/ Null, alternate hypothesis/ Significance p-value
https://www.youtube.com/channel
q Trends (correlation and Regression) /UC5L-iT8tjbDMz3ObOoAWoeQ
q Compare/ Differences
q Parametric {(1-sample, 2-independent, 2-dependent samples) t-test), ANOVA)}
q Non-parametric (Z-test, Mann-Whitney, Chi-square, Kruskal Wallis)

Data Handling; A. prof. Abdelghafar Abu-Elsaoud 5


Data handling and statistical analyses
Statistics Microsoft Excel SPSS
q Definitions q Introduction q Introduction
q Data/Types of data q Main menus q Main Menus
q Population and samples & sampling q Alternatives: Apache q Alternatives SPSS, PSPP, and
q Element, Data, variable Open office Minitab
q Sources of data q Data entry q SPSS, PSPP, and Minitab
q Variables/Types q How to import/ export q SPSS interfaces (data view,
q Normal distribution / testing normality handle data variable view
• (Shapiro-Wilk, Kolmogorov Smirnov, Q-Q plots, q Add instructions (add ins) q Data entry
Histograms)
q Formatting tables q Name/type/label/values
q Descriptive statistics q Forma Figures q Missing values
• Central tendency measures (Mean, q Copy/paste/paste special q Measure (type: nominal,
Median, Mode)
q Data handling ordinal, …)
• Variability measures (Variance, SD, SE, C.V.)
q Special export and saving q How to import handle data
q Presentation of data (line, bar, frequency, of Figures and tables for q Data exploring
histogram, boxplot) journals requirements q Descriptive statistic (overall-
q Validity and reliability
q Statistical analyses on general, specific groups)
q Outlier’s Detection (missing values handling)
Excel q Normality testing
q Inferential statistics q Descriptive statistic on q Parametric x nonparametric
q Hypothesis testing/ Null, alternate
Microsoft Excel q Outlier’s detection
hypothesis/ Significance p-value
q Inferential statistic q Figures and graphing
q Trends (correlation and Regression)
q ANOVA, post hoc q Data analysis: ANOVA,
q Compare/ Differences handling t-test, Mann-Whitney,
q Parametric {(1-sample, 2-independent, 2-
q Formulas on Excel Wilcoxon, Chi-square,
dependent samples) t-test), ANOVA)}
q Special formulas Kruskal Wallis)
q Non-parametric (Z-test, Mann-Whitney, Chi- q Hands on training q Hands on training
square, Kruskal Wallis)
6
Data Handling; Associate prof. Abdelghafar Abu-Elsaoud
Statistics
Definitions
General: A subject with applications in a vast number
of different fields..
Statistics is the methodology for collecting,
analyzing, interpreting and drawing conclusions from
information in order to make intelligent decisions.
Biostatistics: a portmanteau word made from
biology and statistics)
The application of statistics to a wide range of topics
in biology.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 7


Biostatistics
It is the science which deals with development and
application of the most appropriate methods for the:
Ø Collection of data.
Ø Presentation of the collected data.
Ø Analysis and interpretation of the results.
Ø Making decisions on the basis of such analysis

Other definitions for Statistics


Ø Frequently used in referral to recorded data
Ø Denotes characteristics calculated for a set of data:
sample mean
Data Handling; A.prof. Abdelghafar Abu-
8
Elsaoud
Role of statisticians
ETo guide the design of an experiment or
survey prior to data collection

:To analyze data using proper statistical


procedures and techniques

*To present and interpret the results to


researchers and other decision makers
Data Handling; A.prof. Abdelghafar Abu-
9
Elsaoud
Statistics provide methods for
1. Design Planning and carrying out research studies
2. Description summarizing and exploring data
3. Inference: making predictions and generalizing about
phenomena represented by the data.

Data Handling; A.prof. Abdelghafar Abu-


10
Elsaoud
11
Data Handling; A.prof. Abdelghafar Abu-Elsaoud
Types of statistics
• Descriptive statistics:
Statistics devoted to the summarization and
description of data. It consist of methods for
organizing, displaying, describing and
summarizing information or data.
• Inferential statistics
Statistics concerned with using sample
data to make an inference about a
population of data.
Data Handling; A.prof. Abdelghafar Abu-
12
Elsaoud
STATISTICS
Transform data into a useful information for researcher

Descriptive statistics Inferential statistics


Drawing conclusion a concerning
Collecting, summarizing and describing data populations based on statistical test

Mathematical presentations Graphical presentations

Measures of central tendency Measures of Dispersion

Data Handling; A.prof. Abdelghafar Abu-


13
Elsaoud
Descriptive statistics
1- Numerical presentation
Tabular presentation (simple – complex)
Simple frequency distribution Table (S.F.D.T.)

Table (I): Distribution of 50 patients at the surgical department of


Alexandria hospital in May 2008 according to their ABO blood groups
Blood group Frequency %
A 12 24
B 18 36
AB 5 10
O 15 30
Total 50 100
Data Handling; A.prof. Abdelghafar Abu-
14
Elsaoud
Descriptive
statistics
1- Numerical
presentation

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 15


Descriptive statistics
1- Graphical presentation
Descriptive statistics include the construction of graphs,
charts, and tables and the calculations of various
descriptive measures such as averages, measures of
variation, and percentiles. In fact, the most part of this
course deals with descriptive statistics.
Pie chart Bar chart Line graph

Data Handling; A.prof. Abdelghafar Abu-


16
Elsaoud
Inferential statistics
Drawing conclusion a concerning
populations based on statistical test
Inferential statistics consist of method
that use samples of the population to
make decisions or predictions about
population itself

Data Handling; A.prof. Abdelghafar Abu-


17
Elsaoud
Sample and population
Population and samples are two basic
concepts in statistics
1) Population: can be characterized as the
set of individual persons or objects in
which an investigator is primarily
interested during his or her research
problem.

Data Handling; A.prof. Abdelghafar Abu-


SGS 116 9
18
Elsaoud
Sample and population
Population is the collection of all individuals
or items under consideration in statistical
study.

A statistical population is the set of


measurements corresponding to the entire
collection of units for which inferences are to
be made.

Data Handling; A.prof. Abdelghafar Abu-


SGS 116 9
19
Elsaoud
Sample and population
Sample is that part of the population from
which information is collected.
OR
It is a portion of the population selected
for study is referred to as a sample.
OR
A sample from statistical population is the set
of measurements that are actually collected
in the course of an investigation.
Data Handling; A.prof. Abdelghafar Abu-
20
Elsaoud
Sample and population

Data Handling; A.prof. Abdelghafar Abu-


21
Elsaoud
Common important statistical terms
• Element: it is a specific subject or object (e.g.
person, company or country) about which the
information is selected.

• Data: Measurements or observations of a


variable

Data Handling; A.prof. Abdelghafar Abu-


22
Elsaoud
Common important statistical terms

Sources of data

Records Surveys Experiments

Comprehensive Sample

Data Handling; A.prof. Abdelghafar Abu-


23
Elsaoud
Common important statistical terms
• Variable:
– A characteristic that is observed or manipulated,
– Can take on different values
– It is the main aim of any study and generally
takes different values for different elements.

– Observation: It is the value of any variable

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 24


Statistical terms (cont.)
• Independent variables (Factor)
– Precede dependent variables in time
– Are often manipulated by the researcher
– The treatment or intervention that is used in a study
– Factor In an experiment, the factor (also called an
independent variable) is an explanatory variable
manipulated by the experimenter. Each factor has
two or more levels (i.e., different values of the factor).
Combinations of factor levels are called treatments.
• Dependent variables
– What is measured as an outcome in a study
– Values depend on the independent variable
Data Handling; A.prof. Abdelghafar Abu-
25
Elsaoud
‫ﻋﻧوان اي ﺑﺣث ﻏﺎﻟﺑﺎ ً ﯾﺣﺗوي ﻋﻠﻲ اﻟﻌﺎﻣل اﻟﻣﺳﺗﻘل واﻟﺗﺎﺑﻊ ﻣﺛﻸ‬

Title:
study the effect of independent factor on
dependent variable at __________

:‫اﻟﻌﻧوان‬
"‫دراﺳﺎت ــــــ ﻟﺗﺄﺛﯾر "ﻋﺎﻣل ﻣﺳﺗﻘل” ﻋﻠﻲ ـــ"ﻋﺎﻣل ﺗﺎﺑﻊ‬

Data Handling; A.prof. Abdelghafar Abu-


26
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 27
‫ ﻧﻮع اﻟﺒﯿﺎﻧﺎت‬:‫ﻧﺸﺎط‬
Activity: Type of data “variable”
Socrative student
1. Room Name: AGHM
2. Your Name: your name

AGHM

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 28


Common important statistical terms
Examples for variables can include

1- Quantitative variables: as height


weight, readings of B.P or age

2- Qualitative variables: as skin color,


sex, eye color and

Data Handling; A.prof. Abdelghafar Abu-


29
Elsaoud
Types of variables

Quantitative variables Qualitative variables


Numerical = continuous CATEGORICAL= Discrete
contain the kinds of numbers we are
accustomed to dealing with in
counting and mathematics
Nominal Ordinal
used to Similar to nominal
Intervals Ratios represent variables in that
categories that numbers are assigned
Interval cannot have
defy ordering. to represent items
variables range negative values,
within a category.
from −∞ to +∞, whereas ratio
e.g. Whereas nominal
like numbers on variables cannot variables have no real
(1= male,
a number line be less than zero. 2=female) rank order to Them e.g.
Data Handling; A.prof. Abdelghafar Abu- 1=high, =moderate,3=low
30
Elsaoud
Types of variables

Data Handling; A.prof. Abdelghafar Abu-


31
Elsaoud
Normality
Normality testing

‫ﻗﯾﺎس اﻟﺗوزﯾﻊ اﻟطﺑﯾﻌﻲ‬

Data Handling; A.prof. Abdelghafar Abu-


32
Elsaoud
Role of Normality
• Many statistical methods
require that the numeric
variables we are working
with have an approximate
normal distribution.
• For example, t-tests, F-
tests, and regression
analyses all require in
some sense that the
Standardized normal distribution
numeric variables are with empirical rule percentages.
approximately normally
distributed.
Tools for Assessing Normality
Histogram Normal Quantile Plot Goodness of Fit Test
and Boxplot (also called Normal Probability Plot)
parametric

Kolmogorov-
Smirnov Test (SPSS)
nonparametric

Shapiro-Wilk Test (SPSS, JMP)

Anderson-Darling Test (MINITAB)


Histograms and Boxplots

The cholesterol levels of


the patients appear to be
approximately normal,
although there is some
evidence of right skewness
as the mean is larger than
the median.
The red curve represents a
normal distribution fit to
these data and the blue
curve the density estimate
for these data, these curves
should agree if our data is
normally distributed.
Positive Skewness Negative Skewness
(Tail to Right) (Tail to Left)

Data Handling; A.prof. Abdelghafar Abu-


36
Elsaoud
Testing normality of a distribution
§ Normality is not assessed visually (i.e., it looks normal to me)
§ We mathematically examine whether a given distribution as a whole
deviates from a comparable normal distribution (having same mean
and same standard deviation) .
§ We use Kolmogorov-Smirnov and Shapiro-Wilk tests
“Is the given distribution different than normal?”
§ None significant test outcome (p>. 05) indicates similar distribution,
therefore normality
§ A difference (outcome) found significant (p< 0.05) shows non-
normality

37
Testing for Normality using SPSS

If the Sig. value of the Shapiro-Wilk Test is


greater than 0.05, the data is normal. If it
is below 0.05, the data significantly
deviate from a normal distribution

Data Handling; A.prof. Abdelghafar Abu-


38
Elsaoud
Parametric vs. non-parameteric

Data Handling; A.prof. Abdelghafar Abu-


39
Elsaoud
References/Further Reading
1) Dawson-Saunders, B & Trapp, RG. Basic and Clinical
Biostatistics, 2nd edition, 1994.

2) Last, J. A Dictionary of Epidemiology. 3rd edition,1995.

3) Wisniewski, M. Quantitative Methods For Decision Makers, 3rd


edition, 2002.

4) Pidd, M. Tools For Thinking. Modelling in Management


Science. 2nd edition, 2003.

Data Handling; A.prof. Abdelghafar Abu-


Elsaoud

40
STATISTICS
Transform data into a useful information for researcher

Descriptive statistics Inferential statistics


Drawing conclusion a concerning
Collecting, summarizing and describing data populations based on statistical test

Mathematical presentations Graphical presentations

Measures of central tendency Measures of Dispersion

Data Handling; A.prof. Abdelghafar Abu-


41
Elsaoud
Sample versus population

Data Handling; A.prof. Abdelghafar Abu-


42
Elsaoud
Graphical2-Presentation of
Graphical presentation
data

o Graphs drawn using coordinates


o Line graph
o Frequency polygon
o Frequency curve
o Histogram
o Bar graph
o Scatter plot
o Pie chart

Data Handling; A.prof. Abdelghafar Abu-


43
Elsaoud
Line graph Bar chart Frequency polygon

Pie graph Phylogenetic tree Frequency polygon

Data Handling; A.prof. Abdelghafar Abu-


44
Elsaoud
BOXPLOT

Data Handling; A.prof. Abdelghafar Abu-


45
Elsaoud
Stacked columns chart Tabular presentation

Treatment-A

Treatment-B

Treatment-A+B

Data Handling; A.prof. Abdelghafar Abu-


46
Elsaoud
Problem : A table gives the hemoglobin level 10 patients at dental
clinic. Find out the mode, min, max, mean, SEM, SD, variance

Hemoglobin level
9
8
10
11
9
7
6
8
9
7

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 47


Problem : A table gives the hemoglobin level 10 patients at dental
clinic. Find out the mode, min, max, mean, SEM, SD, variance

Hemoglobin level Hemoglobin level


N
9 (X) X2 x-x (x-x)2
1 9 81 0.6 0.36
8 2 8 64 -0.4 0.16
10 3 10 100 1.6 2.56
11 4 11 121 2.6 6.76
9 5 9 81 0.6 0.36
6 7 49 -1.4 1.96
7 7 6 36 -2.4 5.76
6 8 8 64 -0.4 0.16
8 9 9 81 0.6 0.36
9 10 7 49 -1.4 1.96
N 10
7
Sum Σx= 84 Σx2= 85 20.4

Σx2 (Σx)2=(84)2=
Average Σx/N= 84/10 = 8.4

Variance=S2 variance = 20.4/9 = 2.266

Data Handling; A.prof. Abdelghafar Abu-


48
Elsaoud
Descriptive statistics: Measures of
variations: Coefficient of variations
• The coefficient of variation is used to compare the biological
variability between two sets of observations independent on
the units of measurements
• The coefficient of variation is calculated according to the
following formula:

• Where x = arithmetic mean; SD= standard deviation


Sometimes, we need to compare two different data sets that have different
units. In such cases, a measure of relative variability is preferable through
the coefficient of variation, which expresses the standard deviation as a
percentage of the mean (and thus has no units)

Data Handling; A.prof. Abdelghafar Abu-


49
Elsaoud
Remember : Measures of variability

Coefficient of variation

Data Handling; A.prof. Abdelghafar Abu-


50
Elsaoud
Problem
• In biological assessment of thyroid hormones in blood
samples using two different methods; radioimmunoassay
and metamorphosis. The following results were obtained

Compare the variability of the results of radioimmunoassay and


Electrophoresis methods

Data Handling; A.prof. Abdelghafar Abu-


51
Elsaoud
Solution
1- Radioimmunoassay method

Coefficient of variation (C.V.) of radioimmunoassay is 22.45 % of its mean (0.98)

2- Electrophoresis method

Coefficient of variation (C.V.) of electrophoresis is 52.31 % of its mean (0.65)

Which has more spread or variability?

Data Handling; A.prof. Abdelghafar Abu-


52
Elsaoud
Test of validity
• A valid mean is a reliable mean and the results
obtained in statistical analysis should be valid
otherwise further calculations and interpretations
are not correct.
• The validity is calculated in relation to the S.D.
• The mean value is considered valid if The mean is
at least 2.5 times the S.D

If the mean value is not valid, the number of observations


(sample size) should be increased so as to reduce the S.D since
this is related to the number of experiments.

Data Handling; A.prof. Abdelghafar Abu-


53
Elsaoud
Problem
• The effect of a drug on the body weight of
adult volunteers was measured by injecting
this drug in a group of 8 volunteers. The
mean value for the first group was 70 kg and
the standard deviation for the same group
was 12 kg
• Test the validity of these observations?

Data Handling; A.prof. Abdelghafar Abu-


54
Elsaoud
Solution
• The mean value is considered valid if
The mean is at least 2.5 times the
S.D

• Mean= 70 Kg
• SD= 12
• SD*2.5=12 x 2.5 = 30 < mean (70)
• As the mean is more than 2.5 of SD
(mean>2.5 of SD) so the data is
considered as valid or reliable.
Data Handling; A.prof. Abdelghafar Abu-
55
Elsaoud
Problem: An experiment was carried out to evaluate the effect
of a new formula of insulin on blood glucose levels in 10 patients
and the following results was obtained:

Test the validity of the data and determine the coefficient of variation
Data Handling; A.prof. Abdelghafar Abu-
56
Elsaoud
Solution

Test of validity
As the mean is more than 2.5 of SD (mean>2.5 of SD) so the data is
considered as valid or reliable.

Data Handling; A.prof. Abdelghafar Abu-


57
Elsaoud
Testing Reliability and Validity using
SPSS

Data Handling; A.prof. Abdelghafar Abu-


58
Elsaoud
Cronbach's alpha is the most common measure of internal
consistency ("reliability"). It is most commonly used when
you have multiple Likert questions in a
survey/questionnaire that form a scale and you wish to
determine if the scale is reliable. If you are concerned with
inter-rater reliability, we also have a guide on using Cohen's
(κ) kappa that you might find useful.

Data Handling; A.prof. Abdelghafar Abu-


59
Elsaoud
Cronbach's alpha is 0.805, which
indicates a high level of internal
consistency for our scale with this
specific sample.

Data Handling; A.prof. Abdelghafar Abu-


60
Elsaoud
Outliers and Outlier
detections

Data Handling; A.prof. Abdelghafar Abu-


61
Elsaoud
Outliers
Analysis> Descriptive Statistics>Explore has an Percentiles options
(Statistics button), displaying the table shown, producing the statistics
required to define numerically the fences. The interquartile range IQR can
be computed as shown (difference between Q3 and Q1). Use Tukey's
hinges, as boxplots are based on this definition of a quartile.

Data Handling; A.prof. Abdelghafar Abu-


62
Elsaoud
Probability
• In the study of probability and statistics, we
deal with the presentation and interpretation of
chance outcomes that occur in planned
experiments or scientific investigations
Statistical Experiment
• Statisticians use the term statistical experiment
(or simply experiment) to describe any process
that generates an outcome that cannot be
predicted in advance with certainty

Data Handling; A.prof. Abdelghafar Abu-


63
Elsaoud
Definitions
* Sample space Ω
The set of all possible outcomes of a statistical experiment is known
as the sample space of the experiment and is denoted by the
symbol S

* Sample point
Each outcome in a sample space is called a sample space element or
a sample space member or simply a sample point.

* Random experiment
An experiment or observation that can be repeated numerous times
under the same conditions
• Examples
• Roll a dice
• Flip a coin
• Diagnose a disease in a person
• Measure body weight of a student
Data Handling; A.prof. Abdelghafar Abu-
64
Elsaoud
Probability
P(A) = relative frequency of a measurable event A in Ω

* The total probability of all outcomes in the sample space is always 1.


• Coin tossing: P(H) + P(T) = 1,
• Die rolling: P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1.

Data Handling; A.prof. Abdelghafar Abu-


65
Elsaoud
Relative frequency, probability
• When an experiment is performed "independently” several times, we
talk about a sequence of experiments. The independent execution of
the experiments intuitively means that the outcome of any execution
of the experiment is not influenced by the outcome of any other
execution of the experiment. For example, when slipping a coin or
rolling a die, the thrown object should rotate sufficiently many times
so that its earlier position does not influence noticeably the outcome
of the given toss.
• Let us perform the experiment n times independently, and observe
how many times a given event E occurs. Let us denote this number by
k (or kE), and call it the frequency of event E.
• The ratio k/n shows in what proportion event E has occurred during
the n executions of the experiment. This ratio is called the relative
frequency of event E.
Data Handling; A.prof. Abdelghafar Abu-
66
Elsaoud
Relative frequency, probability
• Example When slipping a coin, let us use the notation H if after the
flip the face-up side is "head", and the notation T if after the is the
face-up side is “Tail”. We flip a coin 20 times one after the other
"independently”, and let the finite sequence of outcomes be
TTHTHTTHTTHTHHHTHHTT
• The frequency of the side "head" is 9, and its relative frequency is=
9/20 = 0.45. Denote the frequency of the side "head" in the nth flip
by kn, then its relative frequency is kn/n , where n=1,2,..., 20. The 20-
element finite sequence of relative frequencies of the side “head” in
the above sequence of experiment is
T T H T HT T H T T H T H H H T H H T T

Data Handling; A.prof. Abdelghafar Abu-


67
Elsaoud
Hypothesis Testing

Data Handling; A.prof. Abdelghafar Abu-


68
Elsaoud
Hypothesis Testing

Data Handling; A.prof. Abdelghafar Abu-


69
Elsaoud
Two tailed vs. One tailed

Two tailed One tailed


A one-tailed hypothesis is simply one that specifies the direction of a difference or correlation,
while a two-tailed hypothesis is one that does not.

Data Handling; A.prof. Abdelghafar Abu-


70
Elsaoud
Hypothesis Testing

Data Handling; A.prof. Abdelghafar Abu-


71
Elsaoud
Significance
• A Significant difference means that the
difference still persist when the experiment is
repeated several times and will not be zero at
any time.
• A non-significant difference means that the
difference can occur by chance and not due to an
external factor such as the effect of a drug or due
to a disease
• Tests of significance are used to determine
whether the difference between two or more
values is significant or not.
Data Handling; A.prof. Abdelghafar Abu-
72
Elsaoud
Two tailed vs. One tailed

Two tailed One tailed


A one-tailed hypothesis is simply one that specifies the direction of a difference or correlation,
while a two-tailed hypothesis is one that does not.

Data Handling; A.prof. Abdelghafar Abu-


73
Elsaoud
• The tests of significance depend on the fact that
there are factors or items which are calculated to
determine the limits of the normal variations that
occur by chance.
• These factors are given in tables and their values
depend on the number of experiments in the series
of observations. Then these values are calculated
from the results of the experiment and the
calculated value is compared with the tabulated
value.
• If the calculated value is MORE than the tabulated
value, the difference is SIGNIFICANT. On the other
hand, if the calculated value is LESS than the
tabulated value, the difference is NOT SIGNIFICANT.
Data Handling; A.prof. Abdelghafar Abu-
74
Elsaoud
Level of significance
• When applying the tests of significance we
should be sure of our decision by a suitable
percentage. The percentage by which we are sure
of our decision is called the level of significance.
• When we choose the level of significant to be
95%, this means that the probability that the
difference is significant is 95% and that the
difference is not significant or due to chance is
5%. In this case the total probability is 100

Data Handling; A.prof. Abdelghafar Abu-


75
Elsaoud
Level of significance
• It is usually accepted that the total probability is 1,
and that we usually state the probability due to
chance. If we say that the difference is significant at
probability less than 0.05, this means that the
difference is significant by 95% and is not significant
by 5% i.e (P<0.05)

• The level of significance is usually determined


according the seriousness of the decision and it is
usual to test the significance of the difference at
either P < 0.01 or P < 0.05

Data Handling; A.prof. Abdelghafar Abu-


76
Elsaoud
BIOSTATISTICS: Transform data into a useful information
Descriptive stat.
Collecting,
summarizing and
Inferential statistics
describing data Drawing conclusion a concerning populations based on statistical test

Mathematical
Parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “ns”sign. >0.05 )
Central tendency Trend/ Difference/ compare (Parametric data)
Mean, relationship 1-group 2-groups More than2 groups
Median,
Mode
Correlation 1-sample Paired/re Independ. 1-way ANOVA Post-
Measures of (Pearson) lated samples/ 2-way ANOVA hoc:
LSD,
Dispersion Regression Samples Groups MANOVA Duncan’
Variance, Repeated measures
SD, SE, t-test t-test t-test (ANOVA)
s,
Tukey’s
Range,
Min., Max.,
C.V. Non-parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “sign. <0.05 )
Trend/ Difference/ compare
Graphical relationship
1-group 2-groups More than 2 groups
Correlations Z test for Paired 2-independ. Kruskal Wallis
Spearman, Sign. test McNemar samples
MannWhitney Posthoc:
Kendal-Tau Pairwise
Wilcoxon
Regression Signed rank Chi-squared comparisons
(Ordinal, ..)

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 77


Data Handling; A.prof. Abdelghafar Abu-
78
Elsaoud
BIOSTATISTICS: Transform data into a useful information
Descriptive stat.
Collecting,
summarizing and
Inferential statistics
describing data Drawing conclusion a concerning populations based on statistical test

Mathematical
Parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “ns”sign. >0.05 )
Central tendency Trend/ Difference/ compare (Parametric data)
Mean, relationship 1-group 2-groups More than2 groups
Median,
Mode
Correlation 1-sample Paired/re Independ. 1-way ANOVA Post-
Measures of (Pearson) lated samples/ 2-way ANOVA hoc:
LSD,
Dispersion Regression Samples Groups MANOVA Duncan’
Variance, Repeated measures
SD, SE, t-test t-test t-test (ANOVA)
s,
Tukey’s
Range,
Min., Max.,
C.V. Non-parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “sign. <0.05 )
Trend/ Difference/ compare
Graphical relationship
1-group 2-groups More than 2 groups
Correlations Z test for Paired 2-independ. Kruskal Wallis
Spearman, Sign. test McNemar samples
MannWhitney Posthoc:
Kendal-Tau Pairwise
Wilcoxon
Regression Signed rank Chi-squared comparisons
(Ordinal, ..)

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 79


Data Handling; A.prof. Abdelghafar Abu-
80
Elsaoud
Correlation

81
Data Handling; A.prof. Abdelghafar Abu-Elsaoud
Data Handling; A.prof. Abdelghafar Abu-
82
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 83
Data Handling; A.prof. Abdelghafar Abu-
84
Elsaoud
How to interpret correlation results
Pearson/Spearman correlation were carried out to evaluate/assess the relationship
between laser treatment and cellular total antioxidant capacity (TAC).

r=0.879, n=5, p=0.05


1 2 3
There was a direct strong significant correlation
between laser treatment and cellular total antioxidant
capacity (TAC) (r=0.879, n=5, p=0.05*).

r=-0.368, n=5, p>0.05


However, there was a direct strong significant
correlation between laser treatment and cellular total
antioxidant capacity (TAC) (r=0.-368, n=5, p>0.05).

Data Handling; A.prof. Abdelghafar Abu-


85
Elsaoud
How to interpret correlation results
r=0.879, n=5, p=0.05
1 2 3
There was a direct strong significant correlation between laser treatment and cellular total
antioxidant capacity (TAC) (r=0.879, n=5, p=0.05*).

+ve positive p>0.05ns non-significant


(direct) P<0.05 significant
There was a
P<0.01 highly significant
-ve negative
(inverse) P<0.001 highly significant

Data Handling; A.prof. Abdelghafar Abu-


86
Elsaoud
Regression
1. Linear Regression
Dependent variable is continuous,
independent variable(s) continuous or discrete
2. Logistic Regression
Logistic regression is used to find the probability of
event=Success and event=Failure.
Dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature.
3. Polynomial Regression
A regression equation is a polynomial regression equation if the power
of independent variable is more than 1.
4. Stepwise Regression
This form of regression is used when we deal with multiple independent
variables.
5. Ridge Regression
• Ridge Regression is a technique used when the data suffers from
multicollinearity ( independent variables are highly correlated).
Data Handling; A.prof. Abdelghafar Abu-
87
Elsaoud
Simple linear regression trendline
Y

Y= aX + b
R2
b P-value
X

Data Handling; A.prof. Abdelghafar Abu-


88
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 89
Data Handling; A.prof. Abdelghafar Abu-
90
Elsaoud
BIOSTATISTICS: Transform data into a useful information
Descriptive stat.
Collecting,
summarizing and
Inferential statistics
describing data Drawing conclusion a concerning populations based on statistical test

Mathematical
Parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “ns”sign. >0.05 )
Central tendency Trend/ Difference/ compare (Parametric data)
Mean, relationship 1-group 2-groups More than2 groups
Median,
Mode
Correlation 1-sample Paired/re Independ. 1-way ANOVA Post-
Measures of (Pearson) lated samples/ 2-way ANOVA hoc:
LSD,
Dispersion Regression Samples Groups MANOVA Duncan’
Variance, Repeated measures
SD, SE, t-test t-test t-test (ANOVA)
s,
Tukey’s
Range,
Min., Max.,
C.V. Non-parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “sign. <0.05 )
Trend/ Difference/ compare
Graphical relationship
1-group 2-groups More than 2 groups
Correlations Z test for Paired 2-independ. Kruskal Wallis
Spearman, Sign. test McNemar samples
MannWhitney Posthoc:
Kendal-Tau Pairwise
Wilcoxon
Regression Signed rank Chi-squared comparisons
(Ordinal, ..)

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 91


T-test

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 92


Numerical data: a single group
1-sample t-test

Data Handling; A.prof. Abdelghafar Abu-


93
Elsaoud
Interpretation of the confidence interval

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 94


One-Sample T-Test

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 95


Numerical data: 2 related samples
Paired //Dependent samples t-test
• Sometimes it is possible to measure the parameter
in the animal before and after drug administration.
In this case the animal is used as self control.
Examples of this type of experiments are the
determination, of blood glucose before and after
antidiabetic administration and the blood pressure
before and after antihypertensive drug, and the
normal reaction time before and after analgesic
drug. In this case the effect of the drug is
determined in one group which serves as the control
as well as the treated group

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 96


Data Handling; A.prof. Abdelghafar Abu-
97
Elsaoud
Descriptive statistics example
Examples
Haemoglobin level
before and after
treatment with new
product

Before after
6 10
7 11
7 9
Before after 7 10
6 7 7 10 11 9 8 11
9 11
7 8 9 10 11 11
10 12
10 6 6 12 13 11 6 13
7 12 6 11
7 12

Data Handling; A.prof. Abdelghafar Abu-


98
Elsaoud
Paired sample t-test

Data Handling; A.prof. Abdelghafar Abu-


99
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-
100
Elsaoud
Numerical data: 2 Independent samples
Un-paired; 2 unrelated groups t-test

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 101


Independent samples t-test
(Un-paired data; 2 groups) // 2 unrelated groups

• Student "t" test is used to determine the


significance of the difference between two mean
values. Each mean value has its S.D. and S.E. The
test depends on calculation of the "t" value
depending on the two mean values and their
S.D., S.E. as well as the number of experiments
in each series of observation.

Data Handling; A.prof. Abdelghafar Abu-


102
Elsaoud
The "t" value is calculated using the following formula

• Where,
• The mean of the first group
• The mean of the second group
• SE1 The standard error of the first group
• SE2 The standard error of the second group
• N1 The number of observation of the first group
• N2 The number of observation of the second group
• if t Calculated > t Tabulated, this indicates a significance
results and vise versa.
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 103
Example
• The effect of amphetamine and chlorpromazine
on the body weight of adult rats was measured
by injecting each drug in a group of 8 rats. The
results were as follow:
• Amphetamine : 50, 45, 40, 44, 35, 36, 33, 37.
• Chlorpromazine :35, 30, 25, 34, 32, 30, 22, 32.
• Test the significance of the difference between
Chlorpromazine and Amphetamine at P less
than 0.05

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 104


Data Handling; A.prof. Abdelghafar Abu-Elsaoud 105
Data Handling; A.prof. Abdelghafar Abu-
106
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 107
Calculations of student t-
T- calculated T- tabulated

T- calculated = 3.80 T- tabulated = 2.14

There is a significant difference between group 1 and group 2


Data Handling; A.prof. Abdelghafar Abu-Elsaoud 108
How to perform on SPSS: Independent samples t-test

•When this assumption is violated


and the sample sizes for each
group differ, the p value is not
trustworthy. However, the
Independent Samples t Test output
also includes an approximate t
statistic that is not based on
assuming equal population
variances; this alternative statistic,
called the Welch tTest statistic1,
may be used when equal variances
among populations cannot be
assumed. The Welch t Test is also
known an Unequal Variance T Test
or Separate Variances T Test.
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 109
Data Handling; A.prof. Abdelghafar Abu-Elsaoud 110
• The output in the Independent Samples Test table includes two
rows: Equal variances assumed and Equal variances not
assumed. If Levene’s test indicates that the variances are equal
across the two groups (i.e., p-value large), you will rely on the first
row of output, Equal variances assumed, when you look at the
results for the actual Independent Samples t Test (under t-test for
Equality of Means). If Levene’s test indicates that the variances
are not equal across the two groups (i.e., p-value small), you will
need to rely on the second row of output, Equal variances not
assumed, when you look at the results of the Independent
Samples t Test (under the heading t-test for Equality of Means).
• The difference between these two rows of output lies in the way
the independent samples t test statistic is calculated. When equal
variances are assumed, the calculation uses pooled variances;
when equal variances cannot be assumed, the calculation utilizes
un-pooled variances and a correction to the degrees of freedom.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 111


Equal variance assumption
• Levene's test ( Levene 1960) is used to test if k samples have
equal variances. Equal variances across samples is called
homogeneity of variance. Some statistical tests, for example
the analysis of variance, assume that variances are equal
across groups or samples. The Levene test can be used to
verify that assumption.

• Levene's test is an alternative to the Bartlett test. The Levene


test is less sensitive than the Bartlett test to departures from
normality. If you have strong evidence that your data do in
fact come from a normal, or nearly normal, distribution, then
Bartlett's test has better performance.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 112


The Levene test is defined as:
H0: σ21=σ22=…………..=σ2k
Ha: σ2i≠σ2j for at least one pair (i,j).
Where:
σ2 population variance
S2 Sample variance

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 113


BIOSTATISTICS: Transform data into a useful information
Descriptive stat.
Collecting,
summarizing and
Inferential statistics
describing data Drawing conclusion a concerning populations based on statistical test

Mathematical
Parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “ns”sign. >0.05 )
Central tendency Trend/ Difference/ compare (Parametric data)
Mean, relationship 1-group 2-groups More than2 groups
Median,
Mode
Correlation 1-sample Paired/re Independ. 1-way ANOVA Post-
Measures of (Pearson) lated samples/ 2-way ANOVA hoc:
LSD,
Dispersion Regression Samples Groups MANOVA Duncan’
Variance, Repeated measures
SD, SE, t-test t-test t-test (ANOVA)
s,
Tukey’s
Range,
Min., Max.,
C.V. Non-parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “sign. <0.05 )
Trend/ Difference/ compare
Graphical relationship
1-group 2-groups More than 2 groups
Correlations Z test for Paired 2-independ. Kruskal Wallis
Spearman, Sign. test McNemar samples
MannWhitney Posthoc:
Kendal-Tau Pairwise
Wilcoxon
Regression Signed rank Chi-squared comparisons
(Ordinal, ..)

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 114


Numerical data: more than two groups

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 115


ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any significant
differences between the means of three or more independent (unrelated) groups.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 116


Data Handling; A.prof. Abdelghafar Abu-Elsaoud 117
118
Data Handling; A.prof. Abdelghafar Abu-Elsaoud
Post hoc comparisons

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 119


To my experience, differences between the different post-hoc tests are very "academic"
and typically of only little practical relevance. Practically, there are only two tests I
consider:
Tukey's HSD for all-pairwise comparisons and
Dunnett's procedure for multiple-to-one comparisons.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 120


• LSD. Uses t tests to perform all pairwise comparisons between group means. No
adjustment is made to the error rate for multiple comparisons.
• Bonferroni. Uses t tests to perform pairwise comparisons between group means,
but controls overall error rate by setting the error rate for each test to the
experimentwise error rate divided by the total number of tests. Hence, the observed
significance level is adjusted for the fact that multiple comparisons are being made.
• Sidak. Pairwise multiple comparison test based on a t statistic. Sidak adjusts the
significance level for multiple comparisons and provides tighter bounds than
Bonferroni.
• Scheffe. Performs simultaneous joint pairwise comparisons for all possible pairwise
combinations of means. Uses the F sampling distribution. Can be used to examine all
possible linear combinations of group means, not just pairwise comparisons.
• R-E-G-W F. Ryan-Einot-Gabriel-Welsch multiple stepdown procedure based on an F
test.
• R-E-G-W Q. Ryan-Einot-Gabriel-Welsch multiple stepdown procedure based on the
Studentized range.
• S-N-K. Makes all pairwise comparisons between means using the Studentized range
distribution. With equal sample sizes, it also compares pairs of means within
homogeneous subsets, using a stepwise procedure. Means are ordered from highest
to lowest, and extreme differences are tested first.
• Tukey. Uses the Studentized range statistic to make all of the pairwise comparisons
between groups. Sets the experimentwise error rate at the error rate for the
collection for all pairwise comparisons.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 121


• Tukey's b. Uses the Studentized range distribution to make pairwise comparisons
between groups. The critical value is the average of the corresponding value for the
Tukey's honestly significant difference test and the Student-Newman-Keuls.
• Duncan. Makes pairwise comparisons using a stepwise order of comparisons
identical to the order used by the Student-Newman-Keuls test, but sets a protection
level for the error rate for the collection of tests, rather than an error rate for
individual tests. Uses the Studentized range statistic.
• Hochberg's GT2. Multiple comparison and range test that uses the Studentized
maximum modulus. Similar to Tukey's honestly significant difference test.
• Gabriel. Pairwise comparison test that used the Studentized maximum modulus and
is generally more powerful than Hochberg's GT2 when the cell sizes are unequal.
Gabriel's test may become liberal when the cell sizes vary greatly.
• Waller-Duncan. Multiple comparison test based on a t statistic; uses a Bayesian
approach.
• Dunnett. Pairwise multiple comparison t test that compares a set of treatments
against a single control mean. The last category is the default control category.
Alternatively, you can choose the first category. 2-sided tests that the mean at any
level (except the control category) of the factor is not equal to that of the control
category. < Control tests if the mean at any level of the factor is smaller than that of
the control category. > Control tests if the mean at any level of the factor is greater
than that of the control category.

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/idh_onew_post.html

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 122


Assignment
1. In a clinical trial a new Hemoglobin level
formula was tested and Before after
haemoglobin level were 6 10
measured before and after 7 11
treatment, test the 7 9
8 10
significance of the following
5 9
data at 0.05 level 5 9
7 9

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 123


Types of ANOVA
• One-way ANOVA: used to compare means of
groups/populations using one factor
• Two way ANOVA: Used to compare means of
groups/populations using two factors
• Two way ANOVA (Repeated): Used to
compare means of groups/populations using
two factors with interactions among the
factors.

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 124


Calculations of ANOVA

Data Handling; A.prof. Abdelghafar Abu-


125
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-
126
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-
127
Elsaoud
Data Handling; A.prof. Abdelghafar Abu-
128
Elsaoud
Nonparametric data analysis
“Compare/ difference”

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 129


BIOSTATISTICS: Transform data into a useful information
Descriptive stat.
Collecting,
summarizing and
Inferential statistics
describing data Drawing conclusion a concerning populations based on statistical test

Mathematical
Parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “ns”sign. >0.05 )
Central tendency Trend/ Difference/ compare (Parametric data)
Mean, relationship 1-group 2-groups More than2 groups
Median,
Mode
Correlation 1-sample Paired/re Independ. 1-way ANOVA Post-
Measures of (Pearson) lated samples/ 2-way ANOVA hoc:
LSD,
Dispersion Regression Samples Groups MANOVA Duncan’
Variance, Repeated measures
SD, SE, t-test t-test t-test (ANOVA)
s,
Tukey’s
Range,
Min., Max.,
C.V. Non-parametric data (i.e. Shapiro-Wilk, or Kolmogorov Smirnov “sign. <0.05 )
Trend/ Difference/ compare
Graphical relationship
1-group 2-groups More than 2 groups
Correlations Z test for Paired 2-independ. Kruskal Wallis
Spearman, Sign. test McNemar samples Posthoc:
Kendal-Tau MannWhitney Friedman Pairwise
Wilcoxon comparisons
Regression Signed rank Chi-squared
(Ordinal, ..)

Data Handling; A.prof. Abdelghafar Abu-Elsaoud 130


Nominal variables
• Nominal variables classify observations into
discrete categories.
• e.g.
– Variables include sex (male, female)
– Genotype ( AA, Aa, BB, Bb, aa)
– Expressed as a word not a number

Nominal variables are also called categorical, discrete, qualitative, or attribute


variables. "Categorical" is a more common name than "nominal," but some
authors use "categorical" to include both what I'm calling "nominal" and what I'm
calling "ranked," while other authors use "categorical" just for what I'm calling
nominal variables. I'll stick with "nominal" to avoid this ambiguity.
Data Handling; A.prof. Abdelghafar Abu-
131
Elsaoud
3
1
2

Chi-Square test

4
7
6

Data Handling; A.prof. Abdelghafar Abu-

5
132
Elsaoud
133
Data Handling; A.prof. Abdelghafar Abu-Elsaoud

You might also like