Professional Documents
Culture Documents
Qin Xu 27/09/2013
Descriptive Statistics
• Levels of Data
• Measures of Central Tendency
• Measures of Dispersion
• Distribution
A. Levels of Data
1
Dr. Qin Xu 27/09/2013
C. Measures of Dispersion
• Range = (highest – lowest), used with mean
2
Dr. Qin Xu 27/09/2013
D. Distribution
• Frequency Distribution is often used to describe
the distribution of nominal and ordinal data in the
forms of a histogram or a barchart
• Skewed Distribution is a non-symmetrical
distribution of data in which the mean is distorted by
the existence of extreme scores that lie to one side
of the median.
• Normal Distribution is a theoretical distribution
(followed by many random factors) that is
symmetrical and bell shaped, where
mean=median=mode
68% data lie within ±1 stdev
95% data lie within ±2 stdev
99.7% data lie within ±3 stdev
3
Dr. Qin Xu 27/09/2013
Inferential Statistics
• Introduction
• Samples and Populations
• Sampling Error
• Hypothesis testing
• Type I and Type II Errors
A. Introduction
4
Dr. Qin Xu 27/09/2013
5
Dr. Qin Xu 27/09/2013
SE = √variance ⁄ n = s ⁄ √n
C. Hypothesis Testing
Hypothesis Testing is the statistical inference of rejecting and
accepting the “by chance” explanation of observed difference in
samples.
6
Dr. Qin Xu 27/09/2013
• Two-tailed or One-tailed
When α = 0.05, i.e., risk of 5% probability of getting it wrong but 95% probability of getting it right!
Reject
Accept H0 Accept H0 H0
Reject H0
95% 95%
5%
2.5% 2.5%
7
Dr. Qin Xu 27/09/2013
Sig. level
Probability of H0 is correct
100%------------------------------------------------------------5%---------0%
Decreasing probability (α) of committing Type I error
Consider:
Impact of drinks on
physical performance
Two Drink:
Water
lucozade
Resource:
40 Day-pass for Sports Centre
£35
8
Dr. Qin Xu 27/09/2013
9
Dr. Qin Xu 27/09/2013
It assumes that
• Samples come from normally distributed
populations
• Samples to be compared have the same
variance
• Dependent variable is assumed to be
measured on an interval scale.
10
Dr. Qin Xu 27/09/2013
3. Data Levels
There is no statistical test to assess this
- Nominal data: in name only
- Ordinal data: ranked
- Interval data: continuous scale with equal intervals
11
Dr. Qin Xu 27/09/2013
Non-parametric Tests
Disadvantages
Less powerful than parametric tests: significance difference is
detected only when the sample is sufficiently deviated from the
population mean/median. A larger sample is required to be
able to reject H0 when it should be rejected (i.e., avoiding Type
II error)
Not as informative as parametric tests: significance calculated on
either ranks or frequencies of data rather than values
12
Dr. Qin Xu 27/09/2013
B. Choice of Tests
• Wilcoxon Test
Comparing the medians of two matched samples in repeated
measure, matched pair and crossover designs. It assumes
(academically) that samples are from a population that is
symmetrically but not necessarily normally distributed.
(The differences of the paired data are ranked, and the sums of
+ve and –ve ranks are used as statistics)
• Mann-Whitney Test
Comparing the medians of two independent samples in RCT
and cross-sectional designs. It assumes (academically) that
the two samples have similar distributions, i.e., not for
comparing +ve skewed sample distribution with a –ve skewed
one.
(The data from the two samples are ranked together, and sums of
each sample’s rank are used as statistics)
C. Frequency Analysis
Chi-square (χ2) test is one of the best know frequency
analysis test; there are several variations of it that are
applied to different types of data. These tests are
used when the main data are from counting
occurrence of events/choices.
homogeneity test: testing observed frequencies is not
random
red yellow blue green
UK 334 256 328 282
(Random 300 300 300 300)
13
Dr. Qin Xu 27/09/2013
Example
• Introduction
• Correlation Coefficient
• Partial Correlation
• Regression
• Multiple regression
14
Dr. Qin Xu 27/09/2013
A. Introduction
Correlation expresses the extent to which two
variables
vary together (i.e., co-vary). It may be
- measurement of direct cause-effect relationship
- measurement of co-change of two variables due
to the presence of a 3rd variable
15
Dr. Qin Xu 27/09/2013
16
Dr. Qin Xu 27/09/2013
B. Correlation Coefficient
Correlation coefficient (r) is a statistical index of the degree to
which two variables are related.
r=1 r=0 r = -1
Perfect +ve correlation No correlation Perfect -ve correlation
17
Dr. Qin Xu 27/09/2013
C. Overlapping Variance
Partial correlation is when relationship of two variables is
examined with exclusion of another predictor
Cochlear
function
1
2
Age
3
Hearing loss
4
Central
processing
D. Regression Analysis
Correlation analysis indicates the relationship of two
variables
Regression analysis quantifies the magnitude and
direction of relationship between two variables.
Regression equation is the mathematical description of
the relationship that may be used to predict/estimate
the value of one variable from a measurement of the
other. i.e., y = a +bx
Line of least squares is the method adopted for fitting
the regression line where the sum of the vertical
distances of all points from the line is minimised.
18
Dr. Qin Xu 27/09/2013
Y
Y = b0 + b1X
○
○ ○
○
○
○
○ ○
○ y1
○
○
x1
○
b1 = y1 ⁄ x1
a{
X
E. Multiple Regression
Simple regression examines the relationship between
two variables
Multiple regression examines the influence of many
independent variables (predictors) on the measured
dependent variable
Education
1
2
Beginning
3 salary
Final salary
4
5
Previous
experiences
19
Dr. Qin Xu 27/09/2013
• Introduction
A. Introduction
Why not do repeated two-sample tests?
- increased probability in committing Type I error
- Increased probability of committing Type II error if using
reduced significance level
Basis of multiple sample analysis
Difference = (inter-individual variability + intra-individual variability
+ random fluctuations) + treatment variability
20
Dr. Qin Xu 27/09/2013
C. ANOVA
ANOVA is a parametric technique for ≥ 3 samples and it
has the same assumption as t-test:
21
Dr. Qin Xu 27/09/2013
Simple adjustment
for n comparisons, apply αadj = α ⁄ n
so for A, B & C, αadj = 0.05 ⁄ 3 = 0.0167
Bonferroni adjustment
for n comparisons αadj = 1 - n√(1 – α)
so for A, B & C, αadj = 1- 3√(1 – 0.05) = 1 – 0.9831
= 0.0169
22
Dr. Qin Xu 27/09/2013
F = MSM/MSR
R2 = SSM / SST
23
Dr. Qin Xu 27/09/2013
F = MSM/ MSR
24