Quant Methods for Mgmt Course Overview

Quantitative Methods for Management
Dr. Maurizio Romano

romano.maurizio@unica.it
Department of Business and Economics

University of Cagliari, Italy
A.A. 2021/2022
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 1 / 394
Introduction
Mentimeter
Go to
www.menti.com
and use the code
7803 8552
Introduction: The book
Introduction: F.A.Q.
Recordings?
Yes, they will be available on Microsoft Teams. Please, download them all
before they will expire
Slides?
Yes, they will be available on Microsoft Teams (after the Lecture)
Homeworks?
Yes, up to 3 individual homeworks, but they will not be mandatory
Introduction: F.A.Q.
We will strictly follow the Book?

The entire course is based on the Book. However, there will be some
integrations especially w.r.t. the laboratory part.
Which book’s chapters will be covered in this course?

1, 2, (3 and 4 from another source), 5, 6, 7, 8, 9, 10, 15, 17
Which kind of integrations are you talking about?

Laboratory part (R practical analysis examples)
Cross Validation
Dummy Variables
ROC Curve
Cluster Analysis
Introduction: The exam
Written exam
Both theoretical questions and practical exercises (R). However, it will
not be asked to produce R Code
While approaching the exam day, and old exam text will be given as
material. Furthermore, a simulation will be scheduled on the last days
of lectures
50% of the final score
Minimum score for access the Oral exam is 16/30
Oral exam
Mainly theoretical questions, with some clarifications on the written
exam. It might be required to solve an exercise.
50% of the final score
Introduction: The exam
Final Score
Composed by the arithmetical mean of Written exam and Oral exam
scores
Minimum final score for pass the Exam is 18/30
Disclaimer
Both the Written and the Oral exams will be held in presence
There might be exceptions according to the University policies (for
instance, if you have Covid)
However, w.r.t. the pandemic evolution, there might be unexpected
variations (i.e. only virtual mode exams)
Chapter 1
Why do we need statistics?
Types of Data Analysis

Quantitative Methods: Testing theories using numbers
Qualitative Methods: Testing theories using language
Magazine articles/interviews
Conversations
Newspapers
Media broadcasts
The Research Process
Initial Observation
Find something that needs explaining

Observe the real world
Read other research
Test the concept: collect data
Collect data to see whether your hunch is correct
To do this you need to define variables: Anything that can be
measured and can differ across entities or time.
Generating and Testing Theories
Theory
A hypothesized general principle or set of principles that explains
known findings about a topic and from which new hypotheses can be
generated
Hypothesis
A prediction from a theory
E.g. the number of people turning up for a Big Brother audition that
have narcissistic personality disorder will be higher than the general
level (1%) in the population
Falsification
The act of disproving a theory or hypothesis
Generating and Testing Theories
Table 1.1
A table of the number of people at the Big Brother audition split by
weather they had narcissistic personality disorder and whether they were
selected as contestants by the producers
No Disorder Disorder Total

Selected 3 9 12
Rejected 6805 845 7650
Total 6808 854 7662
Data Collection 1: What to measure?
Hypothesis
Chocolate kills dogs
Independent Variable
The proposed cause
A predictor variable
A manipulated variable (in experiments)
Chocolate in the above hypothesis
Dependent Variable
The proposed effect
An outcome variable
Measured not manipulated (in experiments)
Dogs in the above hypothesis
Some important terms
Independent variable
A variable thought to be the cause of some effect. This term in usually used in
experimental research to denote a variable that the experimenter has manipulated
Dependent variable
A variable thought to be affected by changes in an independent variable. You can
think of this variable as an outcome
Predictor variable
A variable thought to predict an outcome variable. This is basically another term
for independent variable
outcome variable
A variable thought to change as a function of changes in a predictor variable. This
term could be synonymous with “dependent variable” for the sake of an easy life.
Levels of Measurement
Categorical (entities are divided into distinct categories):

Binary variable: There are only two categories
e.g. dead or alive.
Nominal variable: There are more than two categories
e.g. whether someone is an omnivore, vegetarian, vegan or fruitarian.
Ordinal variable: The same as a nominal variable but the categories
have a logical order
e.g. whether people got a fail, a pass, a merit or a distinction in their
exam
Continuous (entities ge a distinct score):
Interval variable: Equal intervals on the variable represent equal
differences in the property being measured
e.g. the difference between 6 and 8 is equivalent to the difference
between 13 and 15.
Ratio variable: The same as an interval variable, but the ratios of
scores on the scale must also make sense
e.g. a score of 16 on an anxiety scale means that the person is, in
reality, twice as anxious as someone scoring 8.
Measurement Error
Measurement error
The discrepancy between the actual value we are trying to measure,
and the number we use to represent that value
Example:
You (in reality) weigh 80 kg
You stand on your bathroom scales and they say 83 kg
The measurement error is 3 kg
Validity
Whether an instrument measures what it set out to measure.

Content validity
Evidence that the content of a test corresponds to the content of the
construct it was designed to cover
Ecological validity
Evidence that the results of a study, experiment or test can be applied,
and allow inferences, to real-world conditions
Reliability
Reliability
The ability of the measure to produce the same results under the same
conditions
Test-Retest Reliability
The ability of a measure to produce consistent results when the same
entities are tested at two different points in time
Data Collection 2: How to measure
Correlational research:
Observing what naturally goes on in the world without directly
interfering with it
Cross-sectional research:
This term implies that data come from people at different age points,
with different people representing each age point
Experimental research:
One or more variable is systematically manipulated to see their effect
(alone or in combination) on an outcome variable
Statements can be made about cause and effect
Experimental Research Methods
Cause and Effect (Hume, 1748)

1 Cause and effect must occur close together in time (contiguity)
2 The cause must occur before an effect does
3 The effect should never occur without the presence of the cause
Confounding variables: the “Tertium Quid”
A variable (that we may or may not have measured) other than the
predictor variables that potentially affects an outcome variable
e.g. the relationship between breast implants and suicide is confounded
by self-esteem
Ruling out confounds (Mill, 1865):
An effect should be present when the cause is present and that when
the cause is absent the effect should be absent also
Control conditions: the cause is absent
Methods of Data Collection
Between-group/between-subject/independent
Different entities in experimental conditions
Repeated-measures (within-subject)
The same entities take part in all experimental conditions
Economical
Practice effects
Fatigue
Types of Variation
Systematic Variation
Differences in performance created by a specific experimental
manipulation
Unsystematic Variation
Differences in performance created by unknown factors
e.g. Age, gender, IQ, time of a day, measurement error, etc.
Randomization
Minimizes unsystematic variation
Analysing Data: Histograms
Frequency Distributions (aka Histograms)

A graph plotting values of observations on the horizontal axis, with a
bar showing how many times each value occurred in the data set
The “Normal” Distribution
Bell-shaped
Symmetrical around the centre
The Normal Distribution
Properties of Frequency Distributions
Skew
The symmetry of the distribution
Positive skew (scores bunched at low values with the tail pointing to
high values)
Negative skew (scores bunched at high values with the tail pointing to
low values)
Kurtosis
The “heaviness” of the tails
Leptokurtic = heavy tails
Platykurtic = light tails
Skew
Kurtosis
Central tendency: The Mode
Mode: the most frequent score

Bimodal: having two modes
Multimodal: having several modes
A Bimodal Distribution
Central tendency: The Median
Median: the middle score when scores are ordered

Example: number of friends of 11 Facebook.com users
Central tendency: The Mean
Mean
The sum of scores divided by the number of scores
Number of friends of 11 Facebook.com users
Example
Pn
i=1 xi
X̄ = n
Pn
i=1 xi = 22+40+53+57+93+98+103+108+116+121+252 = 1063
Pn
i=1 xi 1063
X̄ = n = 11 = 96.64
The Dispersion: Range
The Range: the smallest score subtracted from the largest

Example:
Number of friends of 11 Facebook.com users
22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252
Range = 252 - 22 = 230
Very biased by outliers
The Dispersion: The interquartile range
Quartiles
The three values that split the sorted data into four equal parts
Second quartile = median
Lower quartile = median of lower half of the data
Upper quartile = median of upper half of the data
Going beyond the data: z-scores
z-scores
Standardising a score with respect to the other scores in the group
Expresses a score in terms of how many standard deviations it is away
from the mean
The distribution of z-scores has a mean of 0 and SD = 1
X −X̄
z= S
Properties of z-scores
1.96 cuts off the top 2.5% of the distribution

-1.96 cuts off the bottom 2.5% of the distribution
As such, 95% of z-scores lie between -1.96 and 1.96
99% of z-scores lie between -2.58 and 2.58
99.9% of them lie between -3.29 and 3.29
Types of Hypotheses
Null hypothesis, H0
There is no effect
e.g. Big Brother contestants and members of the public will not
differ in their scores on personality disorder questionnaires
Alternative hypothesis, H1
Aka the experimental hypothesis
e.g. Big Brother contestants will score higher on personality disorder
questionnaires than members of the public
Chapter 2
Aims and objectives
Know what a statistical model is and why we use them

The mean
Know what the “fit” of a model is and why it is important
The standard deviation
Distinguish models for samples and populations
Fitting models to real-world data
Populations and Samples
Population:
The collection of units (be they people, plankton, plants, cities, suicidal
authors, etc.) to which we want to generalize a set of findings or a
statistical model
Sample:
A smaller (but hopefully representative) collection of units from a
population used to determine truths about that population
The only equation you will ever need
outcomei = (model) + errori
A simple statistical model
In statistics we fit models to our data (i.e. we use a statistical model

to represent what is happening in the real world)
The mean is a hypothetical value (i.e. it doesn’t have to be a value
that actually exists in the data set)
As such, the mean is simple statistical model
The Mean
The mean is the sum of all scores divided by the number of scores
The mean is also the value from which the (squared) scores deviate
least (it has the least error)
Example
Pn
i=1 xi
mean(X̄ ) = n
The Mean: Example
Collect some data:
1, 3, 4, 3, 2
Add them up:

Pn
i=1 xi = 1 + 3 + 4 + 3 + 2 = 13
Pn
i=1 xi 13
X̄ = n = 5 = 2.6
The mean as a model
outcomelecturer 1 = (X̄ ) + errorlecturer 1
1 = 2.6 + errorlecturer 1
Measuring the “Fit” of the model
The mean is a model of what happens in the real world: the typical
score
It is not a perfect representation of the data
How can we assess how well the mean represents reality?
A perfect fit
Calculating the “Error”
A deviation is the difference between the mean and an actual data

point
Deviations can be calculated by taking each score and subtracting the
mean from it:
deviation = xi − x̄
Calculating the “Error”
Use the Total Error?
We could just take the error between the mean and the data and add
them
Score Mean Deviation
1 2.6 -1.6
2 2.6 -0.6
3 2.6 0.4
3 2.6 0.4
4 2.6 1.4
Total = 0
P
(X − X̄ ) = 0
Sum of Squared Errors
We could add the deviations to find out the total error

Deviations cancel out because some are positive and others negative
Therefore, we square each deviation
If we add these squared deviations we get the sum of squared errors
(SS)
Sum of Squared Errors
Score Mean Deviation Squared Deviation

1 2.6 -1.6 2.56
2 2.6 -0.6 0.36
3 2.6 0.4 0.16
3 2.6 0.4 0.16
4 2.6 1.4 1.96
Total = 5.20
(X − X̄ )2 = 5.20
P
SS =
Variance
The sum of squares is a good measure of overall variability, but is

depentent on the number of scores
We calculate the average variability by dividing by the number of
scores (n)
This value is called the variance (s 2 )
(xi − x̄)2
P
SS 5.20
variance(s 2 ) = = = = 1.3
N −1 N −1 4
Degrees of Freedom
Standard Deviation
The variance has one problem: it is measured in units squared

This isn’t a very meaningful metric so we take the square root value
This is the standard deviation (s)
r
5.20
q Pn
2
i=1 (xi −x̄)
S= n = = 1.02
5
Important Things to Remember
The sum of squares, variance, and standard deviation represent

the same thing:
the “fit” of the mean to the data
The variability in the data
How well the mean represents the observed data
Error
Same Mean, Different SD
The SD and the Shape of a Distribution
Samples vs. Populations
Sample
Mean and SD describe only the sample from which they were calculated
Population
Mean and SD are intended to describe the entire population (very rare in
psychology)
Sample to Population
Mean and SD are obtained from a sample, but are used to estimate the
mean and SD of the population (very common in psychology)
Confidence Intervals
Domjan et al. (1998)

“Conditioned” toothpicks production from trees
True mean
15 million toothpicks
Sample mean
17 million toothpicks
Interval estimate
12 to 22 million (contains true value)
16 to 18 million (misses true value)
CIs constructed such that 95% contain the true value
Test Statistics
A statistic for wich the frequency of particular values is known

Observed values can be used to test hypotheses
test statistic =
variance explained by the model effect
=
variance not explained by the model error
One- and Two-Tailed Tests
Type I and Type II Errors
Type I error
occurs when we believe that there is a genuine effect in our
population when, in fact, there isn’t
The probability is the α-level (usually .05)
Type II error
occurs when we believe that there is no effect in the population when,
in reality, there is
The probability is the β-level (often .2)
What does Statistical Significance tell us?
The importance of an effect?

No, significance depends on sample size
That the null hypothesis is false?

No, it is always false
That the null hypothesis is true?

No, it is never true
Effect Sizes
An effect size is a standardized measure of the size of an effect:

Standardized = comparable across studies
Not (as) reliant on the sample size
Allows people to objectively evaluate size of observed effect
Effect Size Measures
There are several effect size measures that can be used:

Cohen’s d
Pearson’s r
Glass’s ∆
Hedges’s g
Odds ration/risk rates
Pearson’s r is a good intuitive measure
Oh, apart from when group sizes are different...
Effect Size Measures
r = .1, d = .2 (small effect)

the effect explains 1% of the total variance
r = .3, d = .8 (medium effect)

r = .3, d = .8 (large effect)

Beware of these “canned” effect sizes though

The size of effect should be placed within the research context
Chapter 5
Aims
Assumptions of parametric tests based on the normal distribution

Understand the assumption of normality
Graphical displays
Skew
Kurtosis
Normality tests
Understand homogeneity of variance
Levene’s test
Know how to fix problems in the data
Log, square root and reciprocal transformations
Pitfalls and alternatives
Robust tests
Assumptions
Parametric tests based on the normal distribution assume:

Normally distributed
Sampling distribution
Residuals
Homogeneity of variance
Interval or ratio level data
Independent scores
Assessing Normality
We don’t have access to the sampling distribution so we usually test

the observed data
Central limit theorem
Central limit theorem

if N > 30, the sampling distribution is normal anyway
Graphical displays
Q-Q plot (or P-P plot)
Histogram
Assessing Normality
Values of skew/kurtosis
0 in a normal distribution
Convert to z (by dividing value by SE)
Kolmogorov-Smirnov test
Tests if data differ from a normal distribution
Significant = non-normal data
NON-significant = normal data
Normality Example
Think to be a biologist worried about the potential health effects of

music festivals
Download the data (Music Festival)
We measured the hygiene of 810 concert-goers over the three days of
the festival
Hygiene was measured using a standardized technique. Score ranged
from 0 to 4:
0 = you smell pretty bat
4 = you smell definitely good
The Q-Q plot
To draw a Q-Q plot of the hygiene scores for day 1 of the music
festival
1 qqplot . day1 <- qplot ( sample = dlf $ day1 , stat = " qq " )
2 qqplot . day1
To draw a Q-Q plot of the hygiene scores for day 2 of the music
festival
1 qqplot . day2 <- qplot ( sample = dlf $ day2 , stat = " qq " )
2 qqplot . day2
The Q-Q Plot
Assessing Skew and Kurtosis
Using by()
1 by ( data = rexam $ exam , INDICES = rexam $ uni , FUN = describe )
Using stat.desc()
1 by ( data = rexam $ exam , INDICES = rexam $ uni , FUN = stat . desc )
Those commands have the same effect as those above

1 by ( rexam $ exam , rexam $ uni , describe )
2 by ( rexam $ exam , rexam $ uni , stat . desc )
If we want descriptive statistics for multiple variables, then we can use

cbind()
1 by ( cbind ( data = rexam $ exam , data = rexam $ numeracy ) , rexam $ uni , describe )
We can also use describe() and stat.desc() with more than one variable at
the same time using cbind():
1 describe ( cbind ( dlf $ day1 , dlf $ day2 , dlf $ day3 ) )
2
3 stat . desc ( cbind ( dlf $ day1 , dlf $ day2 , dlf $ day3 ) ,
4 basic = FALSE , norm = TRUE )
Another Example
Performance on statistics exam

Participants: N = 100 students
Measures:
Exam: first-year exam scores as a percentage
Computer: measure of computer literacy, %
Lecture: percentage of lectures attended
Numeracy: a measure of numerical ability out of 15
Uni: whether the student attended Sussex University or Duncetown
University
Assessing Normality
Shapiro-Wilk test for exam and numeracy for whole sample

1 shapiro . test ( rexam $ exam )
2 shapiro . test ( rexam $ numeracy )
Output:
1 > Shapiro - Wilk normality test
2 data : rexam $ exam
3 W = 0.9613 , p - value = 0.004991
4 > Shapiro - Wilk normality test
5 data : rexam $ numeracy
6 W = 0.9244 , p - value = 2.424 e -05
Assessing Normality
Shapiro-Wilk test for exam and numeracy split by university

1 by ( rexam $ exam , rexam $ uni , shapiro . test )
2 by ( rexam $ numeracy , rexam $ uni , shapiro . test )
Assessing Normality
Output for exam:

1 rexam $ uni : Duncetown University
2 Shapiro - Wilk normality test
3 data : dd [x , ]
4 W = 0.9722 , p - value = 0.2829
5 -------------------------------------------------------------------
6 rexam $ uni : Sussex University
8 data : dd [x , ]
9 W = 0.9837 , p - value = 0.7151
Assessing Normality
Output for numeracy:

1 rexam $ uni : Duncetown University
3 data : dd [x , ]
4 W = 0.9408 , p - value = 0.01451
5 -------------------------------------------------------------------
6 rexam $ uni : Sussex University
8 data : dd [x , ]
9 W = 0.9323 , p - value = 0.006787
Q-Q Plots
Assessing Homogeneity of Variance
Graphs (see lectures on regression)

Levene’s test
Tests if variance in different groups is the same
Significant = variances not equal
Non-significant = variances are equal
Variance ration
With 2 or more groups
VR = largest variance / smallest variance
if VR < 2, homogeneity can be assumed
Homogeneity of Variance
Assessing Homogeneity of Variance with R
Use the leveneTest() function from the car package:

1 leveneTest ( outcome variable , group , center =
2 median / mean )
Levene’s test for the exam and numeracy scores:

1 leveneTest ( rexam $ exam , rexam $ uni )
2 leveneTest ( rexam $ numeracy , rexam $ uni )
Assessing Homogeneity of Variance with R
Output for Levene’s Test:

1 > leveneTest ( rexam $ exam , rexam $ uni )
2 Levene ’s Test for Homogeneity of Variance ( center = median )
3 Df F value Pr ( > F )
4 group 1 2.0886 0.1516
5 98
6 > leveneTest ( rexam $ numeracy , rexam $ uni )
7 Levene ’s Test for Homogeneity of Variance ( center = median )
8 Df F value Pr ( > F )
9 group 1 5.366 0.02262 *
10 98
Fixing Data Problems
Log transformation log (Xi )

Reduce positive skew
√
Square root transformation Xi
Also reduces positive skew. Can also be useful for stabilizing variance.
Reciprocal transformation 1/(Xi )

Dividing 1 by each score also reduces the impact of large scores. This
transformation reverses the scores; you can avoid this by reversing the
scores before the transformation, 1/(XHighest − Xi )
Fixing Data Problems
log transformation
1 dlf $ logday1 <- log ( dlf $ day1 )
2 dlf $ logday1 <- log ( dlf $ day1 + 1)
Square root transformation

1 dlf $ sqrtday1 <- sqrt ( day1 )
Reciprocal transformation
1 dlf $ recday1 <- 1 / ( dlf $ day1 + 1)
The Effect of Transformations
Distributions of the hygiene data on da1 and day2 after various
transformations
To Transform . . . or not
Transforming the data helps as often as it hinders the accuracy of F

(Games & Lucas, 1966).
Games (1984)
The central limit theorem: sampling distribution will be normal in
samples > 40 anyway.
Transforming the data changes the hypothesis being tested
E.g. when using a log transformation and comparing means, you
change from comparing arithmetic means to comparing geometric
means
In small samples it is tricky to determine normality one way or another.
The consequences for the statistical model of applying the “wrong”
transformation could be worse than the consequences of analysing the
untransformed scores.
Robust Methods: Examples
Chapter 6
Aims
Measuring relationships
Scatterplots
Covariance
Pearson’s correlation coefficient
Nonparametric measures
Spearman’s rho
Kendall’s tau
Interpreting correlations
Causality
Partial correlations
What is a Correlation?
It is a way of measuring the extent to which two variables are related

It measures the pattern of responses across variables
Very small relationship
Positive relationship
Negative relationship
We need to see whether as one variable increases, the other increases,

decreases or stays the same
This can be done by calculating the covariance
We look at how much each score deviates from the mean
If both variables deviate from the mean by the same amount, they are
likely to be related
Revision of variance
The variance tells us by how much scores deviate from the mean for a
single variable
It is closely linked to the sum of squares
Covariance is similar: it tells how much, scores on two variables, differ
from their respective means
Variance
The variance tells us by how much scores deviate from the mean for a
single variable
It is closely linked to the sum of squares
(xi − x̄)2
P
variance =
PN − 1
(xi − x̄)(xi − x̄)
=
N −1
Covariance
Calculate the error between the mean and each subject’s score for the
first variable (x)
Calculate the error between the mean and their score for the second
variable (y)
Multiply these error values
Add these values and you get the cross product deviations
The covariance is the average cross-product deviations:
P
(xi − x̄)(yi − ȳ )
cov (x, y ) =
N −1
Covariance
P
(xi − x̄)(yi − ȳ )
cov (x, y ) =
N −1
(−0.4)(−3) + (−1.4)(−2) + (−1.4)(−1) + (0.6)(2) + (2.6)(4)
=
4
1.2 + 2.8 + 1.4 + 1.2 + 10.4
=
4
17
=
4
= 4.25
Problems with covariance
It depends upon the units of measurement

E.g. the covariance of two variables measured in miles might be 4.25, but
if the same scores are converted to kilometres, the covariance is 11
One solution: standardize it!

Divide by the standard deviations of both variables
The standardized version of covariance is know as the correlation

coefficient
It is relatively unaffected by units of measurement
The Correlation Coefficient
covxy
r =
s s
Px y
(xi − x̄)(yi − ȳ )
=
(N − 1)sx sy
The Correlation Coefficient
covxy
r =
sx sy
4.25
=
1.67 · 2.92
= .87
Correlation: Example
Anxiety and exam performance

Participants: 103 students
Measures:
Time spent revising (hours)
Exam performance (%)
Exam Anxiety (the EAQ, score out of 100)
Gender
General Procedure for Correlations using R
To compute basic correlation coefficients there are three main

functions that can be used:
cor()
cor.test()
rcorr()
Pearson Correlation output
Exam Anxiety Revise

Exam 1.00000000 -0.4409934 0.39672070
Anxiety -0.4409934 1.00000000 -0.7092493
Revise 0.39672070 -0.7092493 1.00000000
Reporting the Results
Exam performance was significantly correlated with exam anxiety (r =

-0.44), and time spent revising (r = 0.40)
The time spent revising was also correlated with exam anxiety (r =
-0.71)
All p < .001
Things to Know about the Correlation
It varies between -1 and +1

0 = no relationship
It is an effect size
±0.1 small effect
±0.3 medium effect
±0.5 large effect
Coefficient of determination, r 2
By squaring the value of r you get the proportion of variance in one
variable shared by the other
Correlation and Causality
The third-variable problem

In any correlation, causality between two variables cannot be assumed
because there may be other measured or unmeasured variables affecting
the results
Direction of causality
Correlation coefficients say nothing about which variable causes the other
to change
Non-parametric Correlation
Spearman’s rho
Pearson’s correlation on the ranked data
Kendall’s tau
Better than Spearman’s for small samples
Example: World’s Biggest Liar competition

68 contestants
Measures
Where they were placed in the competition (first, second, . . . )
Creativity questionnaire (maximum score 60)
Spearman’s Rho Output
Spearman’s rank correlation rho

data: liarData$Position and liarData$Creativity
S = 71948.4, p-value = 0.0008602
alternative hypothesis: true rho is less than 0
sample estimates:
rho
-0.3732184
Kendall’s Tau (Non-parametric)
The output is much the same as for the Spearman’s correlation
Kendall’s rank correlation tau

data: liarData$Position and liarData$Creativity
z = -3.2252, p-value = 0.0006294
alternative hypothesis: true tau is less than 0
sample estimates:
tau
-0.3002413
Partial and Semi-partial Correlations
Partial correlation
Measures the relationship between two variables, controlling for the effect
that a third variable has on them both
Semi-partial correlation
Measures the relationship between two variables controlling for the effect
that a third variable has on only one of the others
Doing Partial Correlation using R
The general form of pcor() is:

1 pcor ( c ( " var1 " , " var2 " , " control1 " , " control2 " , etc .) , var ( dataframe ) )
We can then see the partial correlation and the value of R 2 in the
console by executing:
1 pc
2 pc ^2
Doing Partial Correlation using R
The general form of pcor.test() is:

1 pcor . test ( pcor object , number of control variables , sample size )
Basically, you enter an object that you have created with pcor() (or
you can put the pcor() command directly into the function):
1 pcor . test ( pc , 1 , 103)
Chapter 7
Aims
Understand linear regression with one predictor

Understand how we assess the fit of a regression model
Total sum of squares
Model sum of squares
Residual sum of squares
F
R2
Know how to do regression using R
Interpret a regression model
Assessing the performance: MSE, Cross Validation
What is Regression?
A way of predicting the value of one variable from another

It is a hypothetical model of the relationship between two variables
The model used is a linear one
We describe the relationship using the equation of a straight line
Describing a Straight Line
Yi = bo + bi Xi + i
b0
Intercept (value of Y when X = 0)
Point at which the regression line crosses the Y-axis (ordinate)
bi
Regression coefficient for the predictor
Gradient (slope) of the regression line
Direction/strength of relationship
The Method of Least Squares
This graph shows a scatterplot of some data with a line representing the general trend.
The vertical lines (dotted) represent the differences (or residuals)
between the line and the actual data
How good is the model?
The regression line is only a model based on the data

This model might not reflect reality
We need some way of testing how well the model fits the observed data
How?
The Method of Least Squares
Diagram showing from where the regression sums of squares derive

Summary
SST
Total variability (variability between scores and the mean)
SSR
Residual/error variability
(variability between the regression model and the actual data)
SSM
Model variability
(difference in variability between the model and the mean)
Testing the Model: ANOVA
if the model results in better prediction than using the mean,

then we expect SSM to be much greater than SSR
Testing the Model: ANOVA
Mean squared error

Sums of squares are total values
They can be expressed as averages
These are called mean squares, MS
MSM
F =
MSR
Testing the Model: R 2
R2
The proportion of variance accounted for by the regression model
The Pearson Correlation Coefficient Squared
SSM
R2 =
SST
Regression: An Example
A record company boss was interested in predicting record sales from

advertising
Data
200 Different album releases
Outcome variable
Sales (CDs and downloads) in the week after release
Predictor variable
The amount (in units of 1000$) spent promoting the record before release
Regression in R
We run a regression analysis using the lm() function – lm stand for

“Linear Model”
This function takes the general form:
1 newModel <- lm ( outcome ~ predictor ( s ) , data = dataFrame , na . action = an action ) )
2
3 albumSales .1 <- lm ( album1 $ sales ~ album1 $ adverts )
Regression in R
We can tell R what dataframe to use (data = nameOfDataFrame):

1 albumSales .1 <- lm ( sales ~ adverts , data = album1 )
Output of a Simple Regression
We have created an object called albumSales.1 that contains the

results of our analysis. We can show the object by executing:
1 summary ( albumSales .1)
2
3 > Coefficients :
4 Estimate Std . Error t value Pr ( >| t |)
5 ( Intercept ) 1.341 e +02 7.537 e +00 17.799 <2e -16 * * *
6 adverts 9.612 e -02 9.632 e -03 9.979 <2e -16 * * *
7
8 Signif . codes : 0 ’* * * ’ 0.001 ’* * ’ 0.01 ’* ’ 0.05 ’. ’ 0.1 ’ ’ 1
9
10 Residual standard error : 65.99 on 198 degrees of freedom
11 Multiple R - squared : 0.3346 , Adjusted R - squared : 0.3313
12 F - statistic : 99.59 on 1 and 198 DF , p - value : < 2.2 e -16
Using the Model
RecordSalesi = bo + bi · Advertising budgeti

= 134.14 + (0.09612 · Advertising budgeti )
RecordSalesi = 134.14 + (0.09612 · Advertising budgeti )

= 134.14 + (0.09612 · 100)
= 143.75
Multiple Regression
outcomei = (model) + errori Yi = (b0 + b1 X1i + b2 X2i + · · · + bn Xni ) + i
Example: airplay dataset

Album salesi = (b0 + b1 · Advertising budgeti + b2 · Airplayi ) + i
OLS
SST , SSR and SSM
Multiple regression fit
Complexity vs Precision: a trade-off
n is the number of points in your data sample

k is the number of independent regressors, i.e. the number of
variables in your model, excluding the constant
Adjusted R 2
n−1 n−2
R 2 = 1 − [( n−k−1 )( n−k−2 )( n+1 2
n )](1 − R )
Parsimony adjusted measures of fit

SSE
AIC = n · ln( ) + 2k
n
T-test on regression coefficients
bobserved − bexpected
t =
SEb
bobserved
=
SEb
t ∼ T (N − p − 1)
Confidence intervals for b
Hypothesis testing:
H0 : b = 0vs.H1 : b > 0, H1 : b < 0, H1 : b 6= 0
Methods of regression
Hierarchical
Known predictors (previous research) at first, new predictors afterwards
(forced entry, stepwise)
Forced entry
Model suggested from a theory
Stepwise
Forward (from null or intercept-only model)
Backward (from full or complete model)
Stepwise (like forward but admitting subsequent elimination)
Checking assumptions
Variables types: All Xs (predictors) must be quantitative or
categorical, and Y (response variable) must be quantitative,
continuous and unbounded
Non-zero variance: The predictors should have some variation in value
No perfect multicollinearity
Predictors are uncorrelated with ’external variables’: there should be
no external variables that correlate with any of the Xs included in the
regression model. Obviously, if external variables do correlate with
some Xs, then the model become unreliable (because other variables
can predict Y just as well).
Homoscedasticity
Independent errors (No autocorrelation)
Normally distributed errors
Independence (among different values of Y)
Linearity
Checking assumptions
When the assumptions of regression are met, the model that we get
for a sample can be accurately applied to the population of interest
(the coefficients and parameters of the regression equation are said to
be unbiased )
What an unbiased model does tell us is that on average the
regression model from the sample is the same as the population model
Multicollinearity
Strong correlation between two or more predictors (Xs)

Perfect collinearity exists when at least one X is a perfect linear
combination of the others
It becomes impossible to obtain unique estimates of the regression
coefficients because there are an infinite number of combinations of
coefficients that would work equally well
perfect collinearity is rare in real-life data
less than perfect collinearity is virtually unavoidable
Consequences of collinearity:
Untrustworthy estimated betas
Limited effect size (R 2 )
Difficulties in understanding importance of predictors
Performance of the model
(yi − yˆi )2
P
Mean Square Error: MSE =
n
Training set / Test set (Training set is often called also “Validation”
set)
(K-fold, Leave-One-Out) Cross Validation
K-fold Cross Validation
Since data are often scarce, there might not be enough to set aside
for a validation sample
To work around this issue k-fold CV works as follows:
1 Split the sample into k subsets of equal size
2 For each fold estimate a model on all the subsets except one
3 Use the left out subset to test the model, by calculating a CV metric of
choice
4 Average the CV metric across subsets to get the CV error
This has the advantage of using all data for estimating the model.
Common used K values are K=1, 5, 10. K=1 is a particular case
called LOOCV (Leave-One-Out CV)
Chapter 8
Aims
When and why do we use logistic regression?

Binary
Multinomial
Theory behind logistic regression
Assessing the model
Assessing predictors
Things that can go wrong
Interpreting logistic regression
When and Why
To predict an outcome variable that is categorical from one or more

categorical or continuous predictor variables
Used because having a categorical outcome variable violates the
assumption of linearity in normal regression
Logistic with one predictor
1
P(Y ) =
1 + exp−(b0 +b1 X1 +)
Outcome
We predict the probability of the outcome occurring
b0 and b1
Can be thought of in much the same way as multiple regression
Note the normal regression equation forms part of the logis,c regression
equation
Logistic with several predictors
1
P(Y ) =
1 + exp−(b0 +b1 X1 +b2 X2 +···+bn Xn +)
Outcome
We still predict the probability of the outcome occurring
Differences
Note the multiple regression equation forms part of the logistic
regression equation
This part of the equation expands to accommodate additional
predictors
expresses the multiple linear regression equation in logarithmic terms
(called the logit) and thus overcomes the problem of violating the
assumption of linearity
The resulting value from the equation varies between 0 and 1. A value
close to 0 means that Y is very unlikely to have occurred, and a value
close to 1 means that Y is very likely to have occurred.
Assessing the Model
N
X
log −likelihood = [Yi ·ln(P(Yi ))+(1−Yi )·ln(1−P(Yi ))]
i=1
The log-likelihood statistic

Analogous to the residual sum of squares in multiple regression
It is an indicator of how much unexplained information there is after
the model has been fitted
Large values indicate poorly fitting statistical models
Estimated parameters b0 , b1 , . . . , bn , are those maximizing
log-likelihood
Assessing changes in models
It’s possible to calculate a log-likelihood for different models and to

compare these models by looking at the difference between their
log-likelihoods.
χ2 = 2[LL(new ) − LL(baseline)]with(df = knew − kbaseline )

The statistic χ2 has a chi-square distribution with df degrees of
freedom
The change in LL can be evaluated with a statistical hypothesis test:
H0 : χ2 = 0vsH1 : χ2 > 0
Assessing predictors: the Wald Statistic
b
Wald =
SEb
Similar to t-statistic in regression
The Wald statistic follows a normal distribution (a.k.a. z-statistic)
Test the null hypothesis that b=0
Is biased when b is large
Better to look at likelihood ration statistics (change in LL)
Assessing predictors: the Odds Ratio
odds after a unit change in the predictor

Odds Ratio =
odds before a unit change in the predictor
Helps in model interpretation
Odds Ratio = e B orexp(B)
Indicates the change in odds resulting from a unit change in the
predictor:
OR > 1: Predictor ↑, probability of outcome occurring ↑
OR < 1: Predictor ↑, probability of outcome occurring ↓
Assessing predictors: the Odds Ratio

Odds Ratio =
odds before a unit change in the predictor
P(event)
odds =
P(no event)
1
P(event Y ) =
1 + exp−(b0 +b1 X1 )
P(no eventY ) = 1 − P(event Y )
∆odds =
original odds
Methods of Regression
Forced entry: all variables entered simultaneously

Hierarchical: variables entered in blocks
Blocks should be based on past research, or theory being tested. Good
method.
Stepwise: variables entered on the basis of statistical criteria (i.e.
relative contribution to predicting outcome)
Should be used only for exploratory analysis
Things that can go wrong
Assumptions from linear regression:

Linearity
there is a linear relationship between any continuous predictors and the
logit of the outcome variable
Independence of errors
cases of data should not be related
Multicollinearity
this assumption can be checked with tolerance and VIF statistics (or
other measures)
Unique problems
Incomplete information
Complete separation
Overdispersion
Do you smoke? Do you eat tomatoes? Do you have cancer?

Yes No Yes
Yes Yes Yes
No No Yes
No Yes ???
All possibilities should be accounted for in the data

Causes estimation method to be slow and unstable
This point applies not only to categorical variables, but also to
continuous ones
As a general point, whenever samples are broken down into categories
and one or more combinations are empty it creates problems.
These will probably be signalled by coefficients that have
unreasonably large standard errors.
Complete separation
the outcome variable can be perfectly predicted by one variable or a
combination of variables:
this problem often arises when too many variables are fitted to too
few cases
Often the only satisfactory solution is to collect more data
sometimes a neat answer is found by using a simpler model
Overdispersion
Overdispersion is where the variance is larger than expected from the

model
This can be caused by violating the assumption of independence
This problem makes the standard errors too small!
An example
predictors of a treatment intervention

participants: 113 adults with a medical problem
outcome: “cured (1)” or “not cured” (0)
predictors:
Intervention: intervention or no treatment
Duration: the number of days before treatment that the patient had
the problem
Logistic Regression Analysis using R
1 newModel <- glm ( outcome ~ predictor ( s ) , data = dataFrame ,

2 family = name of a distribution ,
3 na . action = an action )
Logistic Regression Analysis using R
1 eelModel .1 <- glm ( Cured ~ Intervention , data = eelData ,

2 family = binomial () )
3 eelModel .2 <- glm ( Cured ~ Intervention + Duration , data = eelData ,
4 family = binomial () )
5
6 summary ( eelModel .1)
7 summary ( eelModel .2)
Output Model1: Intervention only
1 Call :
2 glm ( formula = Cured ~ Intervention , family = binomial () , data = eelData )
3
4 Deviance Residuals :
5 Min 1Q Median 3Q Max
6 -1.5940 -1.0579 0.8118 0.8118 1.3018
7
8 Coefficients :
9 Estimate Std . Error z value Pr ( >| z |)
10 ( Intercept ) -0.2877 0.2700 -1.065 0.28671
11 I n t e r v e n t i o n I n t e r v e n t i o n 1.2287 0.3998 3.074 0.00212 * *
12
13 ( Dispersion parameter for binomial family taken to be 1)
14
15 Null deviance : 154.08 on 112 degrees of freedom
16 Residual deviance : 144.16 on 111 degrees of freedom
17 AIC : 148.16
Assessing the model: R
partial correlation between Y and each X

defined in [-1,+1]
R > 0 → increasing X causes an increase in P(Y=1)
R < 0 → increasing X causes a decrease in P(Y=1)
R ' 0 → X is not important within the model
s
z 2 − 2df
R=
−2LL(baseline)
Assessing the model: R 2
proportional reduction in the absolute value of LL

a measure of how much the badness of fit improves as a result of the
inclusion of the predictor variables
can vary between 0 (indicating that the predictors are useless at
predicting the outcome variable) and 1 (indicating that the model
predicts the outcome variable perfectly)
Assessing the model: R 2
Hosmer and Lemeshow’s

−2LL(model)
RL2 =
−2LL(baseline)
Cox and Snell’s
2 (−2LL(model) − (−2LL(baseline)))
RCS = 1 − exp( )
n
Negelkerke’s
2
RCS
RN2 =
−2LL(baseline)
1 − exp(− )
n
Assessing the model information criteria
AIC = −2LL + 2k
BIC = −2LL + 2k · log (n)

We want a measure of fit that we can use to compare two models
which penalizes a model that contains more predictor variables
You can think of this as the price you pay for something: you get a
be]er value of R 2 , but you pay a higher price, and was that higher
price worth it? These information criteria help you to decide.
Writing a function to compute R 2
1 l o gi s t i c P se u d o R 2s <- func , on ( LogModel ) {

2 dev <- LogModel $ deviance
3 nullDev <- LogModel $ null . deviance
4 modelN <- length ( LogModel $ fitted . values )
5 R . l <- 1 - dev / nullDev
6 R . cs <- 1 - exp ( -( nullDev - dev ) / modelN )
7 R . n <- R . cs / ( 1 - ( exp ( -( nullDev / modelN ) ) ) )
8 cat ( " Pseudo R ^2 for logistic regression \ n " )
9 cat ( " Hosmer and Lemeshow R ^2 " , round ( R .l , 3) , " \ n " )
10 cat ( " Cox and Snell R ^2 " , round ( R . cs , 3) , " \ n " )
11 cat ( " Nagelkerke R ^2 " , round ( R .n , 3) , " \ n " )
12 }
Writing a function to compute R 2
To use the function on our model, we simply place the name of the
logistic regression model (in this case eelModel.1) in the function and
execute:
1 l o g is t ic P s eu d o R 2 s ( eelModel .1)
The output will be:

1 Pseudo R ^2 for logistic regression
2 Hosmer and Lemeshow R ^2 0.064
3 Cox and Snell R ^2 0.084
4 Nagelkerke R ^2 0.113
Calculating the Odds Ratio
We can also calculate the odds ra,o as the exponential of the b

coefficient for the predictor variables by executing:
1 exp ( eelModel .1 $ coefficients )
2
3 ( Intercept ) InterventionIntervention
4 0.750000 3.416667
To get the confidence intervals execute:

1 exp ( confint ( eelModel .1) )
2
3 2.5% 97.5%
4 ( Intercept ) 0.4374531 1.268674
5 InterventionIntervention 1.5820127 7.625545
Output Model2: Intervention and Duration as Predictors
1 Call :
2 glm ( formula = Cured ~ Intervention + Duration , family = binomial () , data = eelData )
3
4 Deviance Residuals :
5 Min 1Q Median 3Q Max
6 -1.6025 -1.0572 0.8107 0.8161 1.3095
7
8 Coefficients :
9 Estimate Std . Error z value Pr ( >| z |)
10 ( Intercept ) -0.234660 1.220563 -0.192 0.84754
11 I n t e r v e n t i o n I n t e r v e n t i o n 1.233532 0.414565 2.975 0.00293 * *
12 Duration -0.007835 0.175913 -0.045 0.96447
13
14 ( Dispersion parameter for binomial family taken to be 1)
15
16 Null deviance : 154.08 on 112 degrees of freedom
17 Residual deviance : 144.16 on 110 degrees of freedom
18 AIC : 150.16
Comparing the models
We can use the anova() function

1 anova ( eelModel .1 , eelModel .2)
2
3 > Analysis of Deviance Table
4
5 Model 1: Cured ~ Intervention
6 Model 2: Cured ~ Intervention + Duration
7 Resid . Df Resid . Dev Df Deviance
8 1 111 144.16
9 2 110 144.16 1 0.0019835
Casewise diagnostics
Summary
The overall fit of the final model is shown by the deviance statistic
and its associated chi-square statistic.
If the significance of the chi-square statistic is less than .05, then the
model is a significant fit to the data.
Check the table labelled coefficients to see which variables
significantly predict the outcome.
For each variable in the model, look at the z-statistic and its
significance (which again should be below .05).
Use the Odds Ratio for interpretation. You can obtain this using
exp(model$coefficients), where model is the name of your model.
If the value is greater than 1 then as the predictor increases, the odds
of the outcome occurring increase.
A value less than 1 indicates that as the predictor increases, the odds
of the outcome occurring decrease.
For the aforementioned interpretation to be reliable the confidence
interval of the Odds Ratio should not cross 1!
Multinomial Logistic Regression
Logistic regression to predict membership of more than two categories

It (basically) works in the same way as binary logistic regression
The analysis breaks the outcome variable down into a series of
comparisons between two categories. E.g., if you have three outcome
categories (A, B and C), then the analysis will consist of two
comparisons that you choose:
compare everything against your first category
(e.g. A vs. B and A vs. C),
or your last category (e.g. A vs. C and B vs. C),
or a custom category (e.g. B vs. A and B vs. C).
The important parts of the analysis and output are much the same as
we have just seen for binary logistic regression
Chapter 9: Comparing Two Means
Aims
t-tests:
Independent
Dependent (aka paired, matched)
Rationale for the tests
Assumptions
t-tests as a GLM
Interpretation
Calculating an effect size
Reporting results
Robust methods
Experiments
The simplest form of experiment that can be done is one with only
one independent variable that is manipulated in only two ways and
only one outcome is measured.
More often than not, the manipulation of the independent variable
involves having an experimental condition and a control
E.g., Is the movie Scream 2 scarier than the original Scream? We could
measure heart rates (which indicate anxiety) during both films and
compare them
This situation can be analysed with a t-test
Experiments
Independent t-test
Compares two means based on independent data.
E.g., data from different groups of people.
Dependent t-test
Compares two means based on related data.
E.g., Data from the same people measured at different times.
Data from “matched” samples.
Significance testing
Testing the significance of Pearson’s correlation coefficient
Testing the significance of b in regression.
Rationale for the t-test
Two samples of data are collected and the sample means calculated.
These means might differ by either a little or a lot
If the samples come from the same population, then we expect their
means to be roughly equal. Although it is possible for their means to
differ by chance alone, we would expect large differences between
sample means to occur very infrequently

We compare the difference between the sample means that we collected to the
difference between the sample means that we would expect to obtain if there were
no effect (i.e. if the null hypothesis were true). We use the standard error as a
gauge of the variability between sample means. If the difference between the
samples we have collected is larger than what we would expect based on the
standard error then we can assume one of two things:
There is no effect and sample means in our population fluctuate a lot
and we have, by chance, collected two samples that are atypical of the
population from which they came
The two samples come from different populations but are typical of
their respective parent population. In this scenario, the difference
between samples represents a genuine difference between the samples
(and so the null hypothesis is incorrect)
As the observed difference between the sample means gets larger, the more
confident we become that the second explanation is correct (i.e. that the null
hypothesis should be rejected). If the null hypothesis is incorrect, then we gain
confidence that the two sample means differ because of the different experimental
manipulation imposed on each sample
A = observed difference between sample means

B = expected difference between population means
(if null hypothesis is true)
C = estimate of the standard error of the difference between two sample
means
A−B
t=
C
The t-test as a GLM
Ai = b0 + b1 Gi + i
anxietyi = b0 + b1 groupi + i
Picture Group
The group variable = 0

Intercept = mean of baseline group
X̄Picture = b0 + (b1 × 0)
b0 = X̄Picture
b0 = 40
Real Spider Group
The group variable = 1

b1 = Difference between means
X̄Real = b0 + (b1 × 1)
X̄Real = X̄Picture + b1
b1 = X̄Real − X̄Picture
= 47 − 40 = 7
Output from a Regression
Assumptions of the t-test
Both the independent t-test and the dependent t-test are parametric tests based
on the normal distribution. Therefore, they assume:
The sampling distribution is normally distributed. In the dependent
t-test this means that the sampling distribution of the differences
between scores should be normal, not the scores themselves.
Data are measured at least at the interval level.
The independent t-test, because it is used to test different groups of people, also
assumes:
Variances in these populations are roughly equal (homogeneity of
variance).
Scores in different treatment conditions are independent (because they
come from different people).
The Independent t-test
X̄1 − X̄2
t=s
sp2 sp2
+
n1 n2
(n1 − 1) · s12 + (n2 − 1) · s22

sp2 =
n1 + n2 − 2
Assumptions of the t-test
Is arachnophobia (fear of spiders) specific to real spiders or is a

picture enough?
Participants
24 arachnophobic individuals
Manipulation
12 participants were exposed to a real spider
12 were exposed to a picture of the same spider
Outcome
Anxiety
The Independent t-test Using R
To do a t-test we use the function t.test()

If you have the data for different groups stored in a single column:
1 newModel <- t . test ( outcome ~ predictor , data = dataFrame , paired = FALSE / TRUE )
2
3 ind . t . test <- t . test ( Anxiety ~ Group , data = spiderLong )
If you have the data for different groups stored in two columns:
1 newModel <- t . test ( scores group 1 , scores group 2 , paired = FALSE / TRUE )
2
3 ind . t . test <- t . test ( spiderWide $ real , spiderWide $ picture )
Output from the Independent t-test
Calculating an Effect Size
s
t2
r =
t 2 + df
s
(−1.681)2
r =
(−1.681)2 + 22
r
2.826
=
24.826
= 0.34
On average, participants experienced greater anxiety from real spiders

(M = 47.00, SE = 3.18), than from pictures of spiders (M = 40.00,
SE = 2.68).
This difference was not significant, t(21.4) = −1.68, p ¿ .05;
however, it did represent a medium-sized effect, r = .34.
The Dependent t-test
D̄ − µD
t= s √
D/ N
Example
Is arachnophobia (fear of spiders) specific to real spiders or is a

picture enough?
Participants
12 spider phobic individuals
Manipulation
Each participant was exposed to a real spider and a picture of the same
spider at two points in time
Outcome
Anxiety
The Dependent t-test Using R
To do a dependent t-test we again use the function t.test() but this

time include the option paired = TRUE.
If we have scores from different groups stored in different columns:
1 dep . t . test <- t . test ( spiderWide $ real , spiderWide $ picture , paired = TRUE )
2
3 dep . t . test
If we had our data stored in long format so that our group scores are
in a single column and group membership is expressed in a second
column:
1 dep . t . test <- t . test ( Anxiety ~ Group , data = spiderLong , paired = TRUE )
2 dep . t . test
Output from the Dependent t-test
Calculating the Effect Size
We can compute this value in the same way that we did for the
independent t-test by executing:
1 t <- dep . t . test $ statistic [[1]]
2
3 df <- dep . t . test $ parameter [[1]]
4
5 r <- sqrt ( t ^2 / ( t ^2+ df ) )
6
7 round (r , 3)
On average, participants experienced significantly greater anxiety

from real spiders (M = 47.00, SE = 3.18) than from pictures of
spiders (M = 40.00, SE = 2.68), t(11) = 2.47, p ¡ .05, r = .60.
When Assumptions are Broken
Dependent t-test
Mann–Whitney test
Wilcoxon rank-sum test
Independent t-test
Wilcoxon signed-rank test
Robust tests
Bootstrapping
Trimmed means
Robust Methods to Compare Independent Means
Regardless of whether your data come from the same or different

entities, these functions require the data to be in two different
columns (one for each experimental condition).
Robust Methods to Compare Independent Means
The first robust function, yuen(), is based on a trimmed mean:

1 yuen ( scores group 1 , scores group 2 , tr = .2 , alpha = .05)
We can also compare trimmed means but include bootstrap by using:

1 yuenbt ( scores group 1 , scores group 2 , tr = .2 , nboot = 599 , alpha = .05 , side = F )
A final method is to use bootstrap and an M-estimator (rather than

trimmed mean) by using the pb2gen() function:
1 pb2gen ( spiderWide $ real , spiderWide $ picture , alpha =.05 , nboot =2000 , est = mom )
Output: Robust Methods to Compare Independent Means
Robust Methods to Compare Dependent Means
The first robust function, yuend(), is based on a trimmed mean:

1 yuend ( scores group 1 , scores group 2 , tr = .2 , alpha = .05)
We can also compare trimmed means but include bootstrap by using

ydbt():
1 ydbt ( scores group 1 , scores group 2 , tr = .2 , nboot = 599 , alpha = .05 , side = F )
Output: Robust Methods to Compare Dependent Means
Robust Methods to Compare Dependent Means
A final method is to use bootstrap and an M-estimator (rather than

trimmed mean) by using the bootdpci() function. This function has
the general form:
1 bootdpci ( scores group 1 , scores group 2 , alpha =.05 , nboot =2000 , est = tmean )
For a bootstrap test of dependent M-estimators we execute:

1 results = bootdpci ( spiderWide $ real , spiderWide $ picture , est = tmean , nboot =2000)
2 results $ output
3 con . num psihat p . value p . crit ci . lower ci . upper
4 [1 ,] 1 7.5 0.037 0.05 0.5 13.125
Chapter 10: Comparing Several Means – ANOVA
Aims
Understand the basic principles of ANOVA

Why it is done?
What it tells us?
Theory of one-way independent ANOVA
Following up an ANOVA:
Planned contrasts/comparisons
Choosing contrasts
Coding contrasts
Post hoc tests
When and Why
When we want to compare means we can use a t-test. This test has
limitations:
You can compare only 2 means: often we would like to compare means
from 3 or more groups
It can be used only with one predictor/independent variable
ANOVA
Compares several means
Can be used when you have manipulated more than one independent
variable
It is an extension of regression (the general linear model)
Why don’t use a lots of t-Tests?
If we want to compare several means why don’t we compare pairs of

means with t-tests?
Can’t look at several independent variables
Inflates the Type I error rate
What Does ANOVA tells Us?

Null hypothesis:
Like a t-test, ANOVA tests the null hypothesis that the means are the
same.
Experimental hypothesis:
The means differ.
ANOVA is an omnibus test
It test for an overall difference between groups.
It tells us that the group means are different.
It doesn’t tell us exactly which means differ.
How many comparisons can we do?
k!
C=
2(k − 2)!
5! 120
C= = = 10
2(5 − 2)! 2 · (3 · 2 · 1)
Example: Viagra dataset
Outcome: a measure of libido
ANOVA as Regression
outcomei = (model ) + errori

libidoi = b0 + b2highi + b1lowi + i
Placebo group
libidoi = = b0 + b2 highi + b1 lowi + i
libidoi = = b0 + (b2 · 0) + (b1 · 0)
libidoi = b0
b0 = X̄placebo
High dose group
libidoi = = b0 + (b2 · 1) + (b1 · 0)
libidoi = b0 + b2
X̄high = X̄placebo + b2
b2 = X̄high − X̄placebo
Low dose group
libidoi = = b0 + (b2 · 0) + (b1 · 1)
libidoi = b0 + b1
X̄low = X̄placebo + b1
b1 = X̄low − X̄placebo
Output from regression
Experiments vs. Correlation
ANOVA in regression:
Used to assess whether the regression model is good at predicting an
outcome.
ANOVA in experiments:
Used to see whether experimental manipulations lead to differences in
performance on an outcome (DV)
By manipulating a predictor variable can we cause (and therefore
predict) a change in behaviour?
Asking the same question, but in experiments we systematically
manipulate the predictor, in regression we don’t.
Theory behind ANOVA
We calculate how much variability there is between scores: Total sum

of squares (SST )
We then calculate how much of this variability can be explained by
the model we fit to the data. . .
How much variability is due to the experimental manipulation, model
sum of squares (SSM )
. . . and how much cannot be explained
How much variability is due to individual differences in performance,
residual sum of squares (SSR )
Rationale to Experiments
Variance created by our manipulation: Removal of brain (systematic

variance)
Variance created by unknown factors: E.g. differences in ability
(unsystematic variance)
Rationale to Experiments
Theory behind ANOVA
We compare the amount of variability explained by the model

(experiment), to the error in the model (individual differences)
This ratio is called the F-ratio.
If the model explains a lot more variability than it can’t explain, then
the experimental manipulation has had a significant effect on the
outcome (DV).
Theory behind ANOVA
If the experiment is successful, then the model will explain more

variance than it can’t: SSM will be greater than SSR
ANOVA by hand
Testing the effects of Viagra on libido using three groups:

Placebo (sugar pill)
Low dose Viagra
High dose Viagra
The outcome/dependent variable (DV) was an objective measure of
libido.
The Data
The Data
Total Sum of Squares (SST )
Step 1: calculate SST
Degrees of Freedom
Degrees of freedom (df) are the number of values that are free to
vary.
Think about rugby teams!
In general, the df are one less than the number of values used to
calculate the SS.
dfT = N − 1 = 15 − 1 = 14
Model Sum of Squares SSM
Step 2: calculate SSM
X
SSM = ni (x̄i − x̄grand )2
SSM = = 5(2.2 − 3.467)2 + 5(3.2 − 3.467)2 + 5(5.0 − 3.467)2
= 5(−1.267)2 + 5(−0.267)2 + 5(1.533)2
= 8.025 + 0.355 + 11.755
= 20.135
Model Degrees of Freedom
How many values did we use to calculate SSM

We used the 3 means
dfM = k − 1 = 3 − 1 = 2
Residual Sum of Squares SSR
Step 3: calculate SSR
Step 3: calculate SSR
2 2 2
SSR = sgroup1 (n1 − 1) + sgroup2 (n2 − 1) + sgroup3 (n3 − 1)
SSR = = 1.70(5 − 1) + 1.70(5 − 1) + 1.70(5 − 1)
= 6.8 + 6.8 + 10
= 23.60
Residual Degrees of Freedom
How many values did we use to calculate SSR

We used the 5 scores for each of the SS for each group
dfR = dfgroup1 + dfgroup2 + dfgroup3
= (n1 − 1) + (n2 − 1) + (n3 − 1)
= (5 − 1) + (5 − 1) + (5 − 1)
= 12
Double Check
SST = SSM + SSR
43.74 = 20.14 + 23.60
43.74 = 43.74
dfT = dfM + dfR
14 = 2 + 12
14 = 14
Step 4: calculate the Mean Squared Error
Average amount of variation explained by the model (e.g., the systematic

variation)
SSM 20.135
MSM = = = 10.067
dfM 2
A gauge of the average amount of variation explained by extraneous
variables (the unsystematic variation)
SSR 23.60
MSR = = = 1.967
dfR 12
Step 5: calculate the F-Ratio
MSM
F =
MSR
MSM 10.067
F = = = 5.12
MSR 1.967
Step 6: Construct a Summary Table
Source SS df MS F
Model 20.14 2 10.067 5.12*
Residual 23.60 12 1.967
Total 43.74 14
ANOVA Assumptions
Homogeneity of variance:
The variances of the groups are supposed to be equal. This assumption
can be tested using Levene’s test (see section 5.7.1). If Levene’s test is
significant then we can say that the variances are significantly different.
This would mean that we had violated one of the assumptions of ANOVA
and we would have to take steps to rectify this matter. However, when
sample sizes are unequal, ANOVA is not robust to violations of
homogeneity of variance
Non-normality
When group sizes are equal the F-statistic can be quite robust to
violations of normality
Independence
when this assumption is broken (i.e., observations across groups are
correlated) then the Type I error rate is substantially inflated.
Why Use Follow-Up Tests?
The F-ratio tells us only that the experiment was successful (i.e.
group means were different)
It does not tell us specifically which group means differ from which.
We need additional tests to find out where the group differences lie.
How?
Multiple t-tests
We saw earlier that this is a bad idea
Orthogonal contrasts/comparisons
Hypothesis driven
Planned a priori
Post hoc tests
Not planned (no hypothesis)
Compare all pairs of means
Trend analysis
Planned Contrasts
Basic idea:
The variability explained by the model (experimental manipulation,
SSM ) is due to participants being assigned to different groups.
This variability can be broken down further to test specific hypotheses
about which groups might differ.
We break down the variance according to hypotheses made a priori
(before the experiment).
It’s like cutting up a cake (yum yum!)
Rules When Choosing Contrasts
Independent
Contrasts must not interfere with each other (they must test unique
hypotheses).
Only two chunks

Each contrast should compare only two chunks of variation (why?).
K-1
You should always end up with one less contrast than the number of
groups.
Generating Hypotheses
Example: Testing the effects of Viagra on libido using three groups:

Placebo (sugar pill)
Low dose Viagra
High dose Viagra
Dependent variable (DV) was an objective measure of libido.
Intuitively, what might we expect to happen?
Generating Hypotheses
Placebo Low Dose High Dose

3 5 7
2 2 4
1 4 5
1 2 3
4 3 6
Mean 2.20 3.20 5.00
How do I Choose Contrasts?
Big hint:
In most experiments we usually have one or more control groups.
The logic of control groups dictates that we expect them to be
different from groups that we’ve manipulated.
The first contrast will always be to compare any control groups (chunk
1) with any experimental conditions (chunk 2).
Hypotheses
Hypothesis 1:
People who take Viagra will have a higher libido than those who don’t.
placebo 6= (low, high)
Hypothesis 2:
People taking a high dose of Viagra will have a greater libido than
those taking a low dose.
low 6= high
Planned Comparisons
Another Example
Another Example
Defining contrasts using weights Coding Planned

Contrasts: Rules
Rule 1
Groups coded with positive weights compared to groups coded with
negative weights.
Rule 2
The sum of weights for a comparison should be zero.
Rule 3
If a group is not involved in a comparison, assign it a weight of zero.
Defining contrasts using weights Coding Planned
Contrasts: Rules
Rule 4
For a given contrast, the weights assigned to the group(s) in one chunk of
variation should be equal to the number of groups in the opposite chunk
of variation.
Rule 5
If a group is singled out in a comparison, then that group should not be
used in any subsequent contrasts.
Defining contrasts
Defining contrasts
Orthogonal contrasts for the Viagra data
libidoi = b0 + b1 contrast1i + b2 contrast2i

X high + X low + X placebo
b0 = grand mean =
3

X placebo = + (−2b1 ) + (b2 · 0)
3

2b1 = − X placebo
3
6b1 = X high + X low + X placebo − 3X placebo
6b1 = X high + X low − 2X placebo

2b1 = − X placebo
3
6b1 = X high + X low + X placebo − 3X placebo
6b1 = X high + X low − 2X placebo
Effect of experimental group vs control group:
X high + X low
3b1 = − X placebo
2

1 X high + X low
b1 = − X placebo
3 2
X high + X low
3b1 = − X placebo
2
5 + 3.2
= − 2.2
2
= 1.9

X high = b0 + (b1 · 1) + (b2 · 1)
b2 = X high − b1 − b0
b2 = X high − b1 − b0

1 X high + X low X high + X low + X placebo
b2 = X high − − X placebo −
3 2 3

X high + X low
3b2 = 3X high − − X placebo − X high + X low + X placebo
2

6b2 = 6X high − X high + X low − X placebo − 2 X high + X low + X placebo
= 6X high − X high − X low + 2X placebo − 2X high − 2X low − 2X placebo
= 3X high − 3X low
Difference between experimental groups:
1
b2 = (X high − X low ) (1)
2
Orthogonal contrasts for the Viagra data: output
F-statistic unchanged
The intercept is the “grand mean”
The regression coefficient for contrast1 is one-third of the difference
between the average of the experimental conditions and the control
condition
The regression coefficient for contrast2 is half of the difference
between the experimental groups experimental groups were
significantly different from the control (p < .05) but that the
experimental groups were not significantly different (p > .05)
Non-Orthogonal contrasts for the Viagra data
Standard contrasts
One-Way ANOVA using R

When the Test assumptions are met
Using lm():
1 viagraModel <- lm ( libido ~ dose , data = viagraData )
Using aov():
1 viagraModel <- aov ( libido ~ dose , data = viagraData )
2
3 summary ( viagraModel )
Output from aov()
Plot of the model
When variances are not equal across groups
If Levene’s test is significant then it is reasonable to assume that

population variances are different across groups.
We can get the output for Welch’s F for the current data by
executing:
1 oneway . test ( libido ~ dose , data = viagraData )
Output
Robust ANOVA
Require the data to be in wide format rather than the long format
We can reformat the data using unstack():
1 viagraWide <- unstack ( viagraData , libido ~ dose )
This command creates a new dataframe called viagraWide, which is

our Viagra data but in wide format, so each column represents a
different group
Here it is possible to find an interactive example:
https://rdrr.io/r/utils/stack.html
viagraWide
Robust ANOVA
For an ANOVA of the Viagra data based on 20% trimmed means:

1 t1way ( viagraWide )
To compare medians rather than means:

1 med1way ( viagraWide )
To add a bootstrap to the trimmed mean method:

1 t1waybt ( viagraWide )
Robust outputs
Planned Contrasts using R
To do planned comparisons in R we have to set the contrast attribute

of our grouping variable using the contrast() function and then
recreate our ANOVA model using aov(). By default, dummy coding is
used.
We can see this if we summarise our existing viagraModel using the
summary.lm() function rather than summary():
1 summary . lm ( viagraModel )
Output
Polynomial Contrasts: Trend Analysis
Trend Analysis
Follow the general procedure of setting the contrast attribute of the

predictor variable:
1 contrasts ( viagraData $ dose ) <- contr . poly (3)
We then create a new model using aov():

1 viagraTrend <- aov ( libido ~ dose , data = viagraData )
To access the contrasts:

1 summary . lm ( viagraTrend )
Trend Analysis: Output
Trend Analysis: Output
Post Hoc Tests
Compare each mean against all others (pairwise comparisons)

No specific “a priori” predictions about the data and interest in
exploring the data for any between-group differences between means
that exist
In general terms they use a stricter criterion to accept an effect as
significant
Hence, pairwise comparisons control the familywise Type I error by
correcting the level of significance for each test such that the overall
Type I error rate (a) across all comparisons remains at .05
Simplest example is the Bonferroni method:
α
Bonferroniα =
number of tests
Post-hoc tests (Superhero dataset)
Outcome: severity of injury (0 – 100)
Post Hoc Tests Recommendations
How you conduct post hoc tests in R depends on which test you’d
like to do.
Bonferroni and related methods such as the Holm and
Benjamini–Hochberg (BH) variants are done using the
pairwise.t.test() function, which is part of the R base system.
However, Tukey and Dunnett’s test can be done using the glht()
function in the multcomp package.
Finally, Wilcox (2005) has some robust methods implemented in his
functions lincon() and mcpp20().
Bonferroni and BH post hoc tests
1 pairwise . t . test ( viagraData $ libido , viagraData $ dose , p . adjust . method = " bonferroni " )
2
3 pairwise . t . test ( viagraData $ libido , viagraData $ dose , p . adjust . method = " BH " )
Tukey
For the Viagra data, we can obtain Tukey post hoc tests by executing:
1 postHocs <- glht ( viagraModel , linfct = mcp ( dose = " Tukey " ) )
2 summary ( postHocs )
3 confint ( postHocs )
Tukey post hoc test output
Robust post hoc tests
1 lincon ( viagraWide )
2 mcppb20 ( viagraWide )
Effect size of ANOVA
√
r
SSM
A simple measure of effect size is: r = R2 =
SST
r is slightly biased because it is based purely on sums of squares from
the sample and no adjustment is made for the fact that we’re trying
to estimate the effect size in the population. Therefore,
omega-squared is often used instead:
SSM − (dfM )MSR

ω2 =
SST + MSR
values of .01, .06 and .14 represent small, medium and large effects
respectively
Chapter 15: Non-parametric Tests
Aims
When and why do we use non-parametric tests?

Kruskal–Wallis test
Jonckheere–Terpstra test
Friedman’s ANOVA
Ranking data
Interpretation of results
Reporting results
Calculating an effect size
When to use Non-parametric Tests
Non-parametric tests are used when assumptions of parametric tests

are not met.
It is not always possible to correct for problems with the distribution
of a data set
In these cases we have to use non-parametric tests
They make fewer assumptions about the type of data on which they
can be used
The Wilcoxon rank-sum test
The non-parametric equivalent of the independent t-test.

Use to test differences between two conditions in which different
participants have been used
Ranking Data
The test works on the principle of ranking the data for each group:
Lowest score = a rank of 1
Next highest score = a rank of 2, and so on.
Tied ranks are given the same rank: the average of the potential ranks
For an unequal group size
The test statistic (Ws) = sum of ranks in the group that contains the
least people
For an equal group size
Ws = the value of the smaller summed rank
Add up the ranks for the two groups and take the lowest of these
sums to be our test statistic
The analysis is carried out on the ranks rather than the actual data
An Example
A neurologist investigated the depressant effects of certain

recreational drugs.
Tested 20 clubbers.
10 were given an ecstasy tablet to take on a Saturday night.
10 were allowed to drink only alcohol.
Levels of depression were measured using the Beck Depression
Inventory (BDI) the day after and midweek.
Rank the data ignoring the group to which a person belonged.
A similar number of high and low ranks in each group suggests
depression levels do not differ between the groups.
A greater number of high ranks in the ecstasy group than the alcohol
group suggests the ecstasy group is more depressed than the alcohol
group.
Ranking the Depression Scores for Wednesday and Sunday
Provisional Analysis
Running the Analysis Using R Commander
The Nonparametric Tests menu in R Commander and the dialog box for the Wilcoxon
test for independent samples
Running the Analysis Using R
If you have the data for different groups stored in a single column:
1 newModel <- wilcox . test ( outcome ~ predictor , data = dataFrame , paired = FALSE / TRUE )
However, if you have the data for different groups stored in two
columns:
1 newModel <- wilcox . test ( scores group 1 , scores group 2 , paired = FALSE / TRUE )
To compute a basic Wilcoxon test for our Sunday data we could

execute:
1 sunModel <- wilcox . test ( sundayBDI ~ drug , data = drugData )
2 sunModel
For the Wednesday data:

1 wedModel <- wilcox . test ( wedsBDI ~ drug , data = drugData )
2 wedModel
Output from the Wilcoxon Rank-Sum Test
Reporting the results
Depression levels in ecstasy users (Mdn = 17.50) did not differ

significantly from alcohol users (Mdn = 16.00) the day after the
drugs were taken, W = 35.5, p = .286.
However, by Wednesday, ecstasy users (Mdn = 33.50) were
significantly more depressed than alcohol users (Mdn = 7.50),
W = 4, p < .001.
Comparing Two Related Conditions: the Wilcoxon

Signed-Rank Test
Uses:
To compare two sets of scores, when these scores come from the same
participants.
Imagine the experimenter in the previous example was interested in
the change in depression levels for each of the two drugs.
We still have to use a non-parametric test because the distributions of
scores for both drugs were non-normal on one of the two days
Ranking Data in the Wilcoxon
We want to run our analysis on the alcohol and ecstasy groups

separately; therefore, our first job is to split the dataframe into two
using the subset() function:
1 alcoholData <- subset ( drugData , drug == " Alcohol " )
2 ecstasyData <- subset ( drugData , drug == " Ecstacy " )
To run the analysis for the alcohol group execute:

1 alcoholModel <- wilcox . test ( alcoholData $ wedsBDI , alcoholData $ sundayBDI , paired = TRUE ,
correct = FALSE )
2 alcoholModel
and for the ecstasy group:

1 ecstasyModel <- wilcox . test ( ecstasyData $ wedsBDI , ecstasyData $ sundayBDI , paired = TRUE ,
correct = FALSE )
2 ecstasyModel
Output
Reporting the results
For ecstasy users, depression levels were significantly higher on

Wednesday (Mdn = 33.50) than on Sunday (Mdn = 17.50), p = .047
However, for alcohol users the opposite was true: depression levels
were significantly lower on Wednesday (Mdn = 7.50) than on Sunday
(Mdn = 16.0), p = .012
Differences between Several Independent Groups:

the Kruskal–Wallis test
The Kruskal–Wallis test (Kruskal & Wallis, 1952) is the

non-parametric counterpart of the one-way independent ANOVA
If you have data that have violated an assumption then this test can be
a useful way around the problem
The theory for the Kruskal–Wallis test is very similar to that of the
Wilcoxon rank-sum test:
The Kruskal–Wallis test is based on ranked data
The sum of ranks for each group is denoted by Ri (where i is used to
denote the particular group)
Kruskal–Wallis Theory
Once the sum of ranks has been calculated for each group, the test
statistic, H, is calculated as:
X R2 k
12 i
H= − 3(N + 1)
N(N − 1) ni
i=1
Ri is the sum of ranks for each group

N is the total sample size (in this case 80)
ni is the sample size of a particular group (in this case we have equal
sample sizes and they are all 20).
Real data example
Does eating soya cause fertility problems? Does that affect the sperm
count?
Variables:
Outcome: sperm (millions)
IV: Number of soya meals per week:
No Soya meals
1 Soya meal
4 soya meals
7 soya meals
Participants: 80 males (20 in each group)
Data for the Soya Example with Ranks
Provisional Analysis
Run some exploratory analyses on the data
Doing the Kruskal–Wallis Test using R Commander
Doing the Kruskal–Wallis Test using R
For the current data:

1 kruskal . test ( Sperm ~ Soya , data = soyaData )
To interpret the Kruskal–Wallis test, it is useful to obtain the mean

rank for each group:
1 soyaData $ Ranks <- rank ( soyaData $ Sperm )
This command creates a variable Ranks in soyaData dataframe that

is the ranks for the variable Sperm. We can then obtain the mean
rank for each group:
1 by ( soyaData $ Ranks , soyaData $ Soya , mean )
Output from the Kruskal–Wallis test
Boxplot for the Sperm Counts of Individuals

Eating Different Numbers of Soya Meals per Week
Post Hoc Tests for the Kruskal–Wallis Test
1 kruskalmc ( Sperm ~ Soya , data = soyaData )
Post Hoc Tests for the Kruskal–Wallis Test
One of the problems with comparing every group against all others is
that have to be quite strict about accepting a difference as significant
otherwise we will inflate the Type I error rate. To reduce this problem
we could use more focussed comparisons.
In this example, we have a control group that had no soya meals. As
such, a nice succinct set of comparisons would be to compare each
group against the control:
Test 1: one soya meal per week compared to no soya meals
Test 2: four soya meal per week compared to no soya meals
Test 3: seven soya meal per week compared to no soya meals
Yo compare each group to the no-soya group (using a two-tailed test)
we simply execute:
1 kruskalmc ( Sperm ~ Soya , data = soyaData , cont = ’two - tailed ’)
Output
Testing for Trends: the Jonckheere–Terpstra Test
This statistic tests for an ordered pattern to the medians of the

groups you’re comparing.
Essentially it does the same thing as the Kruskal–Wallis test but it
incorporates information about whether the order of the groups is
meaningful.
Use this test when you expect the groups you’re comparing to
produce a meaningful order of medians.
In the current example we expect that the more soya a person eats,
the more their sperm count will go down.
Jonckheere–Terpstra Test Using R
We can conduct a Jonckheere test by executing:

1 jonckheere . test ( soyaData $ Sperm , as . numeric ( soyaData $ Soya ) )
Differences between several related Groups:

Friedman’s ANOVA
Used for testing differences between conditions when:

there are more than two conditions
the same participants have been used in all conditions (each case
contributes several scores to the data)
If you have violated some assumption of parametric tests then this
test can be a useful way around the problem
Theory of Friedman’s ANOVA
The theory for Friedman’s ANOVA is much the same as the other
tests: it is based on ranked data.
Once the sum of ranks has been calculated for each group, the test
statistic, Fr , is calculated as:
k
" #
12 X
Fr = Ri2 − 3N(k + 1)
Nk(k + 1)
i=1
Example
Does the Andikins diet work (low carb diet)?

Variables:
Outcome: weight (kg)
IV: time since beginning the diet:
Baseline
1 month
2 months
Participants: 10 women
Diet Data
Friedman’s ANOVA Using R Commander
Friedman’s ANOVA Using R
To run the Friedman test we simply input the name of our dataframe,
but within the as.matrix() function, which converts it to a matrix.
In this example, we would execute:
1 friedman . test ( as . matrix ( dietData ) )
Output from Friedman’s ANOVA
Post Hoc Tests for Friedman’s ANOVA
For the current data we would execute:

1 friedmanmc ( as . matrix ( dietData ) )
To Sum Up. . .
When data violate the assumptions of parametric tests we can
sometimes find a non-parametric equivalent
Usually based on analysing the ranked data

Compares two independent groups of scores

Compares two dependent groups of scores
Kruskal–Wallis test
Compares more than two independent groups of scores
Friedman’s test
Compares more than two dependent groups of scores
Chapter 17: Principal Components Analysis and Reliability
Aims
What Are factors?

Representing factors
Graphs and equations
Extracting factors
Methods and criteria
Interpreting factor structures
Factor rotation
Reliability
Cronbach’s alpha
When and Why?
To test for clusters of variables or measures

To see whether different measures are tapping aspects of a common
dimension. For e.g.:
anal-retentiveness (a person who pays such attention to detail that it
becomes an obsession and may be an annoyance to others)
number of friends
social skills
All of those might be aspects of the common dimension of “statistical
ability”
R-Matrix
In factor analysis we look to reduce the R-matrix into smaller set of

uncorrelated dimensions
What is a Factor?
If several variables correlate highly, they might measure aspects of a

common underlying dimension.
These dimensions are called factors.
Factors are classification axis along which the measures can be
plotted.
The greater the loading of variables on a factor, the more that factor
explains relationships between those variables.
Graphical Representation
Mathematical Representation
Y = b1 X1 + b2 X2 + · · · + bn Xn
Factori = b1 Variable1 + b2 Variable2 + · · · + bn Variablen
Y = b1 X1 + b2 X2 + · · · + bn Xn
Sociability = b1 Talk1 + b2 Social Skills + b3 Interest +
+b4 Talk2 + b5 Selfishb6 Liar
Consideration = b1 Talk1 + b2 Social Skills + b3 Interest +
+b4 Talk2 + b5 Selfishb6 Liar
Factor Loadings
The b-values in the equation represent the weights of a variable on a

factor
These values are the same as the coordinates on a factor plot
They are called factor loadings
These values are stored in a factor pattern matrix (A)
 
0.87 0.01
 0.96 −0.03
 
 0.92 0.04 
A=
 0.00

 0.82 

−0.10 0.75
0.09 0.70
The R anxiety questionnaire (RAQ)
Initial Considerations
The quality of analysis depends upon the quality of the data

(GI → GO)
Test variables should correlate quite well
r > .3
Avoid multicollinearity:
several variables highly correlated, r > .80
Avoid singularity:
some variables perfectly correlated, r = 1
Screen the correlation matrix, eliminate any variables that obviously
cause concern
Further Considerations
Determinant:
indicator of multicollinearity
should be greater than 0.00001
Kaiser–Meyer–Olkin (KMO):
measures sampling adequacy
should be greater than .5
Bartlett’s test of sphericity:
yeasts whether the R-matrix is an identity matrix
should be significant at p < .05
Anti-image matrix:
measures of sampling adequacy on diagonal,
off-diagonal elements should be small
Reproduced:
correlation matrix after rotation
most residuals should be < |0.05|
Finding Factors: Communality
Common variance:
Variance that a variable shares with other variables
Unique variance:
Variance that is unique to a particular variable
The proportion of common variance in a variable is called the
communality
communality = 1, all variance shared
communality = 0, no variance shared
0 < communality < 1 = some variance shared
Finding Factors
We find factors by calculating the amount of common variance

Circularity
Principal components analysis:
Assume all variance is shared
All communalities = 1
Factor analysis
Estimate communality
Use squared multiple correlation (SMC)
Initial Preparation and Analysis
We want to include all of the variables in our data set in our factor
analysis.
We can calculate the correlation matrix:
1 raqMatrix <- cor ( raqData )
2 round ( raqMatrix , 2)
The R-matrix (or correlation matrix)
Factors Extraction
Kaiser’s criterion
Kaiser (1960): retain factors with eigenvalues > 1
Scree plot
Cattell (1966): use “point of inflexion” of the scree plot
Which rule?
Use Kaiser’s criterion when
less than 30 variables, communalities after extraction > .7
sample size > 250 and mean communality ≥ .6
Scree plot is good if sample size is > 200.
Factor extraction using R
By extracting as many factors as there are variables we can inspect

their eigenvalues and make decisions about which factors to extract.
To create this model we execute one of these commands:
1 pc1 <- principal ( raqData , nfactors = 23 , rotate = " none " )
2 pc1
3
4 pc1 <- principal ( raqMatrix , nfactors = 23 , rotate = " none " )
5 pc1
Principal Components Model
Examples of scree plots for data that probably have two underlying
factors
The Scree Plot for the RAQ Data
Scree plot from principal components analysis of RAQ data

The second plot shows the point of inflexion at the fourth component.
Now that we know how many components we want to extract, we can

rerun the analysis, specifying that number:
1 pc2 <- principal ( raqData , nfactors = 4 , rotate = " none " )
2 pc2 <- principal ( raqMatrix , nfactors = 4 , rotate = " none " )
Principal Components Model: Output
Residuals
Check the residuals and make sure that fewer than 50% have absolute
values greater than 0.05, and that the model fit is greater than 0.90.
Execute the function below:
1 residual . stats <- function ( matrix ) {
2 residuals <- as . matrix ( matrix [ upper . tri ( matrix ) ])
3 large . resid <- abs ( residuals ) > 0.05
4 num b er La r g eR esi d <- sum ( large . resid )
5 propL argeResid <- n um b er L ar g e R e si d s / nrow ( residuals )
6 rmsr <- sqrt ( mean ( residuals ^2) )
7
8 cat ( " Root means squared residual = " , rmsr , " \ n " )
9 cat ( " Number of absolute residuals > 0.05 = " , numberLargeResids , " \ n " )
10 cat ( " Proportion of absolute residuals > 0.05 = " , propLargeResid , " \ n " )
11 hist ( residuals )
12 }
Residuals
Having executed the function, we could use it on our residual matrix:

1 resids <- factor . residuals ( raqMatrix , pc2 $ loadings )
2 residual . stats ( resids )
Or:
1 residual . stats ( factor . residuals ( raqMatrix , pc2 $ loadings ) )
Residuals
Rotation
To aid interpretation it is possible to maximize the loading of a

variable on one factor while minimizing its loading on all other factors
This is known as factor rotation.
There are two types:
orthogonal (factors are uncorrelated)
oblique (factors intercorrelate)
Rotation
Orthogonal Rotation (varimax)
To carry out a varimax rotation, we change the rotate option in the

principal() function from “none” to “varimax” (we could also exclude
it altogether because varimax is the default if the option is not
specified):
1 pc3 <- principal ( raqData , nfactors = 4 , rotate = " varimax " )
2 pc3 <- principal ( raqMatrix , nfactors = 4 , rotate = " varimax " )
Interpreting the factor loading matrix is a little complex; we can make

it easier by using the print.psych() function.
Generally you should be very careful with the cut-off value – if you
think that a loading of .4 will be interesting, you should use a lower
cut-off (say, .3), because you don’t want to miss a loading that was
.39:
1 print . psych ( pc3 , cut = 0.3 , sort = TRUE )
Oblique Rotation (oblimin)
The command for an oblique rotation is very similar to that for an

orthogonal rotation – we just change the rotate option from
“varimax” to “oblimin”.
1 pc4 <- principal ( raqData , nfactors = 4 , rotate = " oblimin " )
2 pc4 <- principal ( raqMatrix , nfactors = 4 , rotate = " oblimin " )
As with the previous model, we can look at the factor loadings from
this model in a nice easy-to-digest format by executing:
1 print . psych ( pc4 , cut = 0.3 , sort = TRUE )
Oblique Rotation (oblimin)
Important!
We assume that algebraic factors represent psychological constructs.

The nature of these psychological dimensions is “guessed at” by
looking at the loadings for a factor.
This assumption is controvertible.
Many argue that factors are statistical truths only – and psychological
fictions
Reliability
Test–retest method
What about practice effects/mood states?
Alternate form method
Expensive and impractical
Split-half method
Splits the questionnaire into two random halves, calculates scores and
correlates them
Cronbach’s alpha
Splits the questionnaire into all possible halves, calculates the scores,
correlates them and averages the correlation for all splits (well, sort of
...)
Ranges from 0 (no reliability) to 1 (complete reliability)
Cronbach’s Alpha
 
var1 cov12 cov13
variance–covariance matrix = cov12 var2 cov23 
cov13 cov23 var3
N 2X covariance
α = Pn 2
Pn
item=1 sitem + item=1 covitem
Interpreting Cronbach’s Alpha
Kline (1999)
Reliable if α > .7
Depends on the number of items
More questions = bigger α
Treat subscales separately
Remember to reverse score reverse phrased items!
If not, α is reduced and can even be negative
Reliability Analysis Using R
Subscale 1 (Fear of computers): items 6, 7, 10, 13, 14, 15, 18

Subscale 2 (Fear of statistics): items 1, 3, 4, 5, 12, 16, 20, 21
Subscale 3 (Fear of mathematics): items 8, 11, 17
Subscale 4 (Peer evaluation): items 2, 9, 19, 22, 23
First, we’ll create four new data sets, containing the subscales for the
items:
1 computerFear <- raqData [ , c (6 , 7 , 10 , 13 , 14 , 15 , 18) ]
2 statisticsFe ar <- raqData [ , c (1 , 3 , 4 , 5 , 12 , 16 , 20 , 21) ]
3 mathFear <- raqData [ , c (8 , 11 , 17) ]
4 peerEvaluati on <- raqData [ , c (2 , 9 , 19 , 22 , 23) ]
Reliability Analysis Using R
To use the alpha() function we simply input the name of the

dataframe for each subscale, and, where necessary, include the keys
option:
1 alpha ( computerFear )
2 alpha ( statisticsFear , keys = c (1 , -1 , 1 , 1 , 1 , 1 , 1 , 1) )
3 alpha ( mathFear )
4 alpha ( peerEv aluatio n )
Reliability Analysis Using R: Output
The End?
Describe factor structure/reliability

What items should be retained?
What items did you eliminate and why?
Application
Where will your questionnaire be used?
How does it fit in with psychological theory?

Quant Methods for Mgmt Course Overview

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quant Methods for Mgmt Course Overview

Uploaded by

Copyright:

Available Formats

Quantitative Methods for Management

Dr. Maurizio Romano

Department of Business and Economics

We will strictly follow the Book?

Which book’s chapters will be covered in this course?

Which kind of integrations are you talking about?

Introduction: The exam

Types of Data Analysis

The Research Process

Find something that needs explaining

The Research Process

Generating and Testing Theories

No Disorder Disorder Total

Data Collection 1: What to measure?

Categorical (entities are divided into distinct categories):

Whether an instrument measures what it set out to measure.

Data Collection 2: How to measure

Cause and Effect (Hume, 1748)

Methods of Data Collection

The Research Process

Frequency Distributions (aka Histograms)

The Normal Distribution

Central tendency: The Mode

Mode: the most frequent score

Central tendency: The Median

Median: the middle score when scores are ordered

The Dispersion: Range

The Range: the smallest score subtracted from the largest

Going beyond the data: z-scores

1.96 cuts off the top 2.5% of the distribution

Aims and objectives

Know what a statistical model is and why we use them

Fitting models to real-world data

The only equation you will ever need

outcomei = (model) + errori

In statistics we fit models to our data (i.e. we use a statistical model

Collect some data:

Add them up:

The mean as a model

outcomei = (model) + errori

outcomelecturer 1 = (X̄ ) + errorlecturer 1

A deviation is the difference between the mean and an actual data

Calculating the “Error”

Sum of Squared Errors

We could add the deviations to find out the total error

Score Mean Deviation Squared Deviation

The sum of squares is a good measure of overall variability, but is

The variance has one problem: it is measured in units squared

The sum of squares, variance, and standard deviation represent

Same Mean, Different SD

Samples vs. Populations

Samples vs. Populations

Domjan et al. (1998)

A statistic for wich the frequency of particular values is known

Type I and Type II Errors

The importance of an effect?

That the null hypothesis is false?

That the null hypothesis is true?

An effect size is a standardized measure of the size of an effect:

There are several effect size measures that can be used:

Effect Size Measures

r = .1, d = .2 (small effect)

outcomei = (model) + errori Yi = (b0 + b1 X1i + b2 X2i + · · · + bn Xni ) + i