You are on page 1of 197

Quantitative Methods for Management

Dr. Maurizio Romano


romano.maurizio@unica.it

Department of Business and Economics


University of Cagliari, Italy
A.A. 2021/2022

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 1 / 394

Introduction

Mentimeter
Go to
www.menti.com
and use the code
7803 8552

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 2 / 394
Introduction: The book

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 3 / 394

Introduction: F.A.Q.

Recordings?
Yes, they will be available on Microsoft Teams. Please, download them all
before they will expire

Slides?
Yes, they will be available on Microsoft Teams (after the Lecture)

Homeworks?
Yes, up to 3 individual homeworks, but they will not be mandatory

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 4 / 394
Introduction: F.A.Q.

We will strictly follow the Book?


The entire course is based on the Book. However, there will be some
integrations especially w.r.t. the laboratory part.

Which book’s chapters will be covered in this course?


1, 2, (3 and 4 from another source), 5, 6, 7, 8, 9, 10, 15, 17

Which kind of integrations are you talking about?


Laboratory part (R practical analysis examples)
Cross Validation
Dummy Variables
ROC Curve
Cluster Analysis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 5 / 394

Introduction: The exam

Written exam
Both theoretical questions and practical exercises (R). However, it will
not be asked to produce R Code
While approaching the exam day, and old exam text will be given as
material. Furthermore, a simulation will be scheduled on the last days
of lectures
50% of the final score
Minimum score for access the Oral exam is 16/30

Oral exam
Mainly theoretical questions, with some clarifications on the written
exam. It might be required to solve an exercise.
50% of the final score

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 6 / 394
Introduction: The exam

Final Score
Composed by the arithmetical mean of Written exam and Oral exam
scores
Minimum final score for pass the Exam is 18/30

Disclaimer
Both the Written and the Oral exams will be held in presence
There might be exceptions according to the University policies (for
instance, if you have Covid)
However, w.r.t. the pandemic evolution, there might be unexpected
variations (i.e. only virtual mode exams)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 7 / 394

Chapter 1

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 8 / 394
Why do we need statistics?

Types of Data Analysis


Quantitative Methods: Testing theories using numbers
Qualitative Methods: Testing theories using language
Magazine articles/interviews
Conversations
Newspapers
Media broadcasts

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 9 / 394

The Research Process

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 10 / 394
Initial Observation

Find something that needs explaining


Observe the real world
Read other research
Test the concept: collect data
Collect data to see whether your hunch is correct
To do this you need to define variables: Anything that can be
measured and can differ across entities or time.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 11 / 394

The Research Process

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 12 / 394
Generating and Testing Theories

Theory
A hypothesized general principle or set of principles that explains
known findings about a topic and from which new hypotheses can be
generated
Hypothesis
A prediction from a theory
E.g. the number of people turning up for a Big Brother audition that
have narcissistic personality disorder will be higher than the general
level (1%) in the population
Falsification
The act of disproving a theory or hypothesis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 13 / 394

Generating and Testing Theories

Table 1.1
A table of the number of people at the Big Brother audition split by
weather they had narcissistic personality disorder and whether they were
selected as contestants by the producers

No Disorder Disorder Total


Selected 3 9 12
Rejected 6805 845 7650
Total 6808 854 7662

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 14 / 394
The Research Process

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 15 / 394

Data Collection 1: What to measure?

Hypothesis
Chocolate kills dogs
Independent Variable
The proposed cause
A predictor variable
A manipulated variable (in experiments)
Chocolate in the above hypothesis
Dependent Variable
The proposed effect
An outcome variable
Measured not manipulated (in experiments)
Dogs in the above hypothesis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 16 / 394
Some important terms

Independent variable
A variable thought to be the cause of some effect. This term in usually used in
experimental research to denote a variable that the experimenter has manipulated

Dependent variable
A variable thought to be affected by changes in an independent variable. You can
think of this variable as an outcome

Predictor variable
A variable thought to predict an outcome variable. This is basically another term
for independent variable

outcome variable
A variable thought to change as a function of changes in a predictor variable. This
term could be synonymous with “dependent variable” for the sake of an easy life.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 17 / 394

Levels of Measurement

Categorical (entities are divided into distinct categories):


Binary variable: There are only two categories
e.g. dead or alive.
Nominal variable: There are more than two categories
e.g. whether someone is an omnivore, vegetarian, vegan or fruitarian.
Ordinal variable: The same as a nominal variable but the categories
have a logical order
e.g. whether people got a fail, a pass, a merit or a distinction in their
exam
Continuous (entities ge a distinct score):
Interval variable: Equal intervals on the variable represent equal
differences in the property being measured
e.g. the difference between 6 and 8 is equivalent to the difference
between 13 and 15.
Ratio variable: The same as an interval variable, but the ratios of
scores on the scale must also make sense
e.g. a score of 16 on an anxiety scale means that the person is, in
reality, twice as anxious as someone scoring 8.
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 18 / 394
Measurement Error

Measurement error
The discrepancy between the actual value we are trying to measure,
and the number we use to represent that value
Example:
You (in reality) weigh 80 kg
You stand on your bathroom scales and they say 83 kg
The measurement error is 3 kg

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 19 / 394

Validity

Whether an instrument measures what it set out to measure.


Content validity
Evidence that the content of a test corresponds to the content of the
construct it was designed to cover
Ecological validity
Evidence that the results of a study, experiment or test can be applied,
and allow inferences, to real-world conditions

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 20 / 394
Reliability

Reliability
The ability of the measure to produce the same results under the same
conditions
Test-Retest Reliability
The ability of a measure to produce consistent results when the same
entities are tested at two different points in time

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 21 / 394

Data Collection 2: How to measure

Correlational research:
Observing what naturally goes on in the world without directly
interfering with it
Cross-sectional research:
This term implies that data come from people at different age points,
with different people representing each age point
Experimental research:
One or more variable is systematically manipulated to see their effect
(alone or in combination) on an outcome variable
Statements can be made about cause and effect

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 22 / 394
Experimental Research Methods

Cause and Effect (Hume, 1748)


1 Cause and effect must occur close together in time (contiguity)
2 The cause must occur before an effect does
3 The effect should never occur without the presence of the cause
Confounding variables: the “Tertium Quid”
A variable (that we may or may not have measured) other than the
predictor variables that potentially affects an outcome variable
e.g. the relationship between breast implants and suicide is confounded
by self-esteem
Ruling out confounds (Mill, 1865):
An effect should be present when the cause is present and that when
the cause is absent the effect should be absent also
Control conditions: the cause is absent

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 23 / 394

Methods of Data Collection

Between-group/between-subject/independent
Different entities in experimental conditions
Repeated-measures (within-subject)
The same entities take part in all experimental conditions
Economical
Practice effects
Fatigue

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 24 / 394
Types of Variation

Systematic Variation
Differences in performance created by a specific experimental
manipulation
Unsystematic Variation
Differences in performance created by unknown factors
e.g. Age, gender, IQ, time of a day, measurement error, etc.
Randomization
Minimizes unsystematic variation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 25 / 394

The Research Process

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 26 / 394
Analysing Data: Histograms

Frequency Distributions (aka Histograms)


A graph plotting values of observations on the horizontal axis, with a
bar showing how many times each value occurred in the data set
The “Normal” Distribution
Bell-shaped
Symmetrical around the centre

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 27 / 394

The Normal Distribution

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 28 / 394
Properties of Frequency Distributions

Skew
The symmetry of the distribution
Positive skew (scores bunched at low values with the tail pointing to
high values)
Negative skew (scores bunched at high values with the tail pointing to
low values)
Kurtosis
The “heaviness” of the tails
Leptokurtic = heavy tails
Platykurtic = light tails

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 29 / 394

Skew

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 30 / 394
Kurtosis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 31 / 394

Central tendency: The Mode

Mode: the most frequent score


Bimodal: having two modes
Multimodal: having several modes

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 32 / 394
A Bimodal Distribution

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 33 / 394

Central tendency: The Median

Median: the middle score when scores are ordered


Example: number of friends of 11 Facebook.com users

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 34 / 394
Central tendency: The Mean

Mean
The sum of scores divided by the number of scores
Number of friends of 11 Facebook.com users

Example
Pn
i=1 xi
X̄ = n

Pn
i=1 xi = 22+40+53+57+93+98+103+108+116+121+252 = 1063
Pn
i=1 xi 1063
X̄ = n = 11 = 96.64

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 35 / 394

The Dispersion: Range

The Range: the smallest score subtracted from the largest


Example:
Number of friends of 11 Facebook.com users
22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252
Range = 252 - 22 = 230
Very biased by outliers

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 36 / 394
The Dispersion: The interquartile range

Quartiles
The three values that split the sorted data into four equal parts
Second quartile = median
Lower quartile = median of lower half of the data
Upper quartile = median of upper half of the data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 37 / 394

Going beyond the data: z-scores

z-scores
Standardising a score with respect to the other scores in the group
Expresses a score in terms of how many standard deviations it is away
from the mean
The distribution of z-scores has a mean of 0 and SD = 1
X −X̄
z= S

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 38 / 394
Properties of z-scores

1.96 cuts off the top 2.5% of the distribution


-1.96 cuts off the bottom 2.5% of the distribution
As such, 95% of z-scores lie between -1.96 and 1.96
99% of z-scores lie between -2.58 and 2.58
99.9% of them lie between -3.29 and 3.29

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 39 / 394

Types of Hypotheses

Null hypothesis, H0
There is no effect
e.g. Big Brother contestants and members of the public will not
differ in their scores on personality disorder questionnaires

Alternative hypothesis, H1
Aka the experimental hypothesis
e.g. Big Brother contestants will score higher on personality disorder
questionnaires than members of the public

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 40 / 394
Chapter 2

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 41 / 394

Aims and objectives

Know what a statistical model is and why we use them


The mean
Know what the “fit” of a model is and why it is important
The standard deviation
Distinguish models for samples and populations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 42 / 394
The Research Process

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 43 / 394

Fitting models to real-world data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 44 / 394
Populations and Samples

Population:
The collection of units (be they people, plankton, plants, cities, suicidal
authors, etc.) to which we want to generalize a set of findings or a
statistical model
Sample:
A smaller (but hopefully representative) collection of units from a
population used to determine truths about that population

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 45 / 394

The only equation you will ever need

outcomei = (model) + errori

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 46 / 394
A simple statistical model

In statistics we fit models to our data (i.e. we use a statistical model


to represent what is happening in the real world)
The mean is a hypothetical value (i.e. it doesn’t have to be a value
that actually exists in the data set)
As such, the mean is simple statistical model

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 47 / 394

The Mean

The mean is the sum of all scores divided by the number of scores
The mean is also the value from which the (squared) scores deviate
least (it has the least error)

Example
Pn
i=1 xi
mean(X̄ ) = n

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 48 / 394
The Mean: Example

Collect some data:

1, 3, 4, 3, 2

Add them up:


Pn
i=1 xi = 1 + 3 + 4 + 3 + 2 = 13
Pn
i=1 xi 13
X̄ = n = 5 = 2.6

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 49 / 394

The mean as a model

outcomei = (model) + errori

outcomelecturer 1 = (X̄ ) + errorlecturer 1

1 = 2.6 + errorlecturer 1

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 50 / 394
Measuring the “Fit” of the model

The mean is a model of what happens in the real world: the typical
score
It is not a perfect representation of the data
How can we assess how well the mean represents reality?

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 51 / 394

A perfect fit

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 52 / 394
Calculating the “Error”

A deviation is the difference between the mean and an actual data


point
Deviations can be calculated by taking each score and subtracting the
mean from it:

deviation = xi − x̄

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 53 / 394

Calculating the “Error”

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 54 / 394
Use the Total Error?

We could just take the error between the mean and the data and add
them
Score Mean Deviation
1 2.6 -1.6
2 2.6 -0.6
3 2.6 0.4
3 2.6 0.4
4 2.6 1.4
Total = 0

P
(X − X̄ ) = 0

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 55 / 394

Sum of Squared Errors

We could add the deviations to find out the total error


Deviations cancel out because some are positive and others negative
Therefore, we square each deviation
If we add these squared deviations we get the sum of squared errors
(SS)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 56 / 394
Sum of Squared Errors

Score Mean Deviation Squared Deviation


1 2.6 -1.6 2.56
2 2.6 -0.6 0.36
3 2.6 0.4 0.16
3 2.6 0.4 0.16
4 2.6 1.4 1.96
Total = 5.20

(X − X̄ )2 = 5.20
P
SS =

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 57 / 394

Variance

The sum of squares is a good measure of overall variability, but is


depentent on the number of scores
We calculate the average variability by dividing by the number of
scores (n)
This value is called the variance (s 2 )

(xi − x̄)2
P
SS 5.20
variance(s 2 ) = = = = 1.3
N −1 N −1 4

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 58 / 394
Degrees of Freedom

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 59 / 394

Standard Deviation

The variance has one problem: it is measured in units squared


This isn’t a very meaningful metric so we take the square root value
This is the standard deviation (s)
r
5.20
q Pn
2
i=1 (xi −x̄)
S= n = = 1.02
5

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 60 / 394
Important Things to Remember

The sum of squares, variance, and standard deviation represent


the same thing:
the “fit” of the mean to the data
The variability in the data
How well the mean represents the observed data
Error

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 61 / 394

Same Mean, Different SD

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 62 / 394
The SD and the Shape of a Distribution

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 63 / 394

Samples vs. Populations

Sample
Mean and SD describe only the sample from which they were calculated

Population
Mean and SD are intended to describe the entire population (very rare in
psychology)

Sample to Population
Mean and SD are obtained from a sample, but are used to estimate the
mean and SD of the population (very common in psychology)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 64 / 394
Samples vs. Populations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 65 / 394

Samples vs. Populations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 66 / 394
Confidence Intervals

Domjan et al. (1998)


“Conditioned” toothpicks production from trees

True mean
15 million toothpicks

Sample mean
17 million toothpicks

Interval estimate
12 to 22 million (contains true value)
16 to 18 million (misses true value)
CIs constructed such that 95% contain the true value

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 67 / 394

Confidence Intervals

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 68 / 394
Confidence Intervals

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 69 / 394

Test Statistics

A statistic for wich the frequency of particular values is known


Observed values can be used to test hypotheses

test statistic =
variance explained by the model effect
=
variance not explained by the model error

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 70 / 394
One- and Two-Tailed Tests

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 71 / 394

Type I and Type II Errors

Type I error
occurs when we believe that there is a genuine effect in our
population when, in fact, there isn’t
The probability is the α-level (usually .05)

Type II error
occurs when we believe that there is no effect in the population when,
in reality, there is
The probability is the β-level (often .2)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 72 / 394
What does Statistical Significance tell us?

The importance of an effect?


No, significance depends on sample size

That the null hypothesis is false?


No, it is always false

That the null hypothesis is true?


No, it is never true

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 73 / 394

Effect Sizes

An effect size is a standardized measure of the size of an effect:


Standardized = comparable across studies
Not (as) reliant on the sample size
Allows people to objectively evaluate size of observed effect

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 74 / 394
Effect Size Measures

There are several effect size measures that can be used:


Cohen’s d
Pearson’s r
Glass’s ∆
Hedges’s g
Odds ration/risk rates
Pearson’s r is a good intuitive measure
Oh, apart from when group sizes are different...

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 75 / 394

Effect Size Measures

r = .1, d = .2 (small effect)


the effect explains 1% of the total variance

r = .3, d = .8 (medium effect)


the effect explains 9% of the total variance

r = .3, d = .8 (large effect)


the effect explains 25% of the total variance

Beware of these “canned” effect sizes though


The size of effect should be placed within the research context

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 76 / 394
Chapter 5

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 77 / 394

Aims

Assumptions of parametric tests based on the normal distribution


Understand the assumption of normality
Graphical displays
Skew
Kurtosis
Normality tests
Understand homogeneity of variance
Levene’s test
Know how to fix problems in the data
Log, square root and reciprocal transformations
Pitfalls and alternatives
Robust tests

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 78 / 394
Assumptions

Parametric tests based on the normal distribution assume:


Normally distributed
Sampling distribution
Residuals
Homogeneity of variance
Interval or ratio level data
Independent scores

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 79 / 394

Assessing Normality

We don’t have access to the sampling distribution so we usually test


the observed data
Central limit theorem

Central limit theorem


if N > 30, the sampling distribution is normal anyway

Graphical displays
Q-Q plot (or P-P plot)
Histogram

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 80 / 394
Assessing Normality

Values of skew/kurtosis
0 in a normal distribution
Convert to z (by dividing value by SE)

Kolmogorov-Smirnov test
Tests if data differ from a normal distribution
Significant = non-normal data
NON-significant = normal data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 81 / 394

Normality Example

Think to be a biologist worried about the potential health effects of


music festivals
Download the data (Music Festival)
We measured the hygiene of 810 concert-goers over the three days of
the festival
Hygiene was measured using a standardized technique. Score ranged
from 0 to 4:
0 = you smell pretty bat
4 = you smell definitely good

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 82 / 394
The Q-Q plot

To draw a Q-Q plot of the hygiene scores for day 1 of the music
festival
1 qqplot . day1 <- qplot ( sample = dlf $ day1 , stat = " qq " )
2 qqplot . day1

To draw a Q-Q plot of the hygiene scores for day 2 of the music
festival
1 qqplot . day2 <- qplot ( sample = dlf $ day2 , stat = " qq " )
2 qqplot . day2

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 83 / 394

The Q-Q Plot

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 84 / 394
Assessing Skew and Kurtosis

Using by()
1 by ( data = rexam $ exam , INDICES = rexam $ uni , FUN = describe )

Using stat.desc()
1 by ( data = rexam $ exam , INDICES = rexam $ uni , FUN = stat . desc )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 85 / 394

Assessing Skew and Kurtosis

Those commands have the same effect as those above


1 by ( rexam $ exam , rexam $ uni , describe )
2 by ( rexam $ exam , rexam $ uni , stat . desc )

If we want descriptive statistics for multiple variables, then we can use


cbind()
1 by ( cbind ( data = rexam $ exam , data = rexam $ numeracy ) , rexam $ uni , describe )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 86 / 394
Assessing Skew and Kurtosis

We can also use describe() and stat.desc() with more than one variable at
the same time using cbind():
1 describe ( cbind ( dlf $ day1 , dlf $ day2 , dlf $ day3 ) )
2
3 stat . desc ( cbind ( dlf $ day1 , dlf $ day2 , dlf $ day3 ) ,
4 basic = FALSE , norm = TRUE )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 87 / 394

Assessing Skew and Kurtosis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 88 / 394
Another Example

Performance on statistics exam


Participants: N = 100 students
Measures:
Exam: first-year exam scores as a percentage
Computer: measure of computer literacy, %
Lecture: percentage of lectures attended
Numeracy: a measure of numerical ability out of 15
Uni: whether the student attended Sussex University or Duncetown
University

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 89 / 394

Assessing Normality

Shapiro-Wilk test for exam and numeracy for whole sample


1 shapiro . test ( rexam $ exam )
2 shapiro . test ( rexam $ numeracy )

Output:
1 > Shapiro - Wilk normality test
2 data : rexam $ exam
3 W = 0.9613 , p - value = 0.004991
4 > Shapiro - Wilk normality test
5 data : rexam $ numeracy
6 W = 0.9244 , p - value = 2.424 e -05

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 90 / 394
Assessing Normality

Shapiro-Wilk test for exam and numeracy split by university


1 by ( rexam $ exam , rexam $ uni , shapiro . test )
2 by ( rexam $ numeracy , rexam $ uni , shapiro . test )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 91 / 394

Assessing Normality

Output for exam:


1 rexam $ uni : Duncetown University
2 Shapiro - Wilk normality test
3 data : dd [x , ]
4 W = 0.9722 , p - value = 0.2829
5 -------------------------------------------------------------------
6 rexam $ uni : Sussex University
7 Shapiro - Wilk normality test
8 data : dd [x , ]
9 W = 0.9837 , p - value = 0.7151

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 92 / 394
Assessing Normality

Output for numeracy:


1 rexam $ uni : Duncetown University
2 Shapiro - Wilk normality test
3 data : dd [x , ]
4 W = 0.9408 , p - value = 0.01451
5 -------------------------------------------------------------------
6 rexam $ uni : Sussex University
7 Shapiro - Wilk normality test
8 data : dd [x , ]
9 W = 0.9323 , p - value = 0.006787

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 93 / 394

Q-Q Plots

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 94 / 394
Assessing Homogeneity of Variance

Graphs (see lectures on regression)


Levene’s test
Tests if variance in different groups is the same
Significant = variances not equal
Non-significant = variances are equal
Variance ration
With 2 or more groups
VR = largest variance / smallest variance
if VR < 2, homogeneity can be assumed

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 95 / 394

Homogeneity of Variance

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 96 / 394
Assessing Homogeneity of Variance with R

Use the leveneTest() function from the car package:


1 leveneTest ( outcome variable , group , center =
2 median / mean )

Levene’s test for the exam and numeracy scores:


1 leveneTest ( rexam $ exam , rexam $ uni )
2 leveneTest ( rexam $ numeracy , rexam $ uni )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 97 / 394

Assessing Homogeneity of Variance with R

Output for Levene’s Test:


1 > leveneTest ( rexam $ exam , rexam $ uni )
2 Levene ’s Test for Homogeneity of Variance ( center = median )
3 Df F value Pr ( > F )
4 group 1 2.0886 0.1516
5 98
6 > leveneTest ( rexam $ numeracy , rexam $ uni )
7 Levene ’s Test for Homogeneity of Variance ( center = median )
8 Df F value Pr ( > F )
9 group 1 5.366 0.02262 *
10 98

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 98 / 394
Fixing Data Problems

Log transformation log (Xi )


Reduce positive skew

Square root transformation Xi
Also reduces positive skew. Can also be useful for stabilizing variance.

Reciprocal transformation 1/(Xi )


Dividing 1 by each score also reduces the impact of large scores. This
transformation reverses the scores; you can avoid this by reversing the
scores before the transformation, 1/(XHighest − Xi )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 99 / 394

Fixing Data Problems

log transformation
1 dlf $ logday1 <- log ( dlf $ day1 )
2 dlf $ logday1 <- log ( dlf $ day1 + 1)

Square root transformation


1 dlf $ sqrtday1 <- sqrt ( day1 )

Reciprocal transformation
1 dlf $ recday1 <- 1 / ( dlf $ day1 + 1)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 100 / 394
The Effect of Transformations
Distributions of the hygiene data on da1 and day2 after various
transformations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 101 / 394

To Transform . . . or not

Transforming the data helps as often as it hinders the accuracy of F


(Games & Lucas, 1966).
Games (1984)
The central limit theorem: sampling distribution will be normal in
samples > 40 anyway.
Transforming the data changes the hypothesis being tested
E.g. when using a log transformation and comparing means, you
change from comparing arithmetic means to comparing geometric
means
In small samples it is tricky to determine normality one way or another.
The consequences for the statistical model of applying the “wrong”
transformation could be worse than the consequences of analysing the
untransformed scores.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 102 / 394
Robust Methods: Examples

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 103 / 394

Chapter 6

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 104 / 394
Aims

Measuring relationships
Scatterplots
Covariance
Pearson’s correlation coefficient
Nonparametric measures
Spearman’s rho
Kendall’s tau
Interpreting correlations
Causality
Partial correlations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 105 / 394

What is a Correlation?

It is a way of measuring the extent to which two variables are related


It measures the pattern of responses across variables

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 106 / 394
Very small relationship

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 107 / 394

Positive relationship

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 108 / 394
Negative relationship

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 109 / 394

Measuring relationships

We need to see whether as one variable increases, the other increases,


decreases or stays the same
This can be done by calculating the covariance
We look at how much each score deviates from the mean
If both variables deviate from the mean by the same amount, they are
likely to be related

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 110 / 394
Measuring relationships

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 111 / 394

Measuring relationships

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 112 / 394
Revision of variance

The variance tells us by how much scores deviate from the mean for a
single variable
It is closely linked to the sum of squares
Covariance is similar: it tells how much, scores on two variables, differ
from their respective means

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 113 / 394

Variance

The variance tells us by how much scores deviate from the mean for a
single variable
It is closely linked to the sum of squares

(xi − x̄)2
P
variance =
PN − 1
(xi − x̄)(xi − x̄)
=
N −1

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 114 / 394
Covariance

Calculate the error between the mean and each subject’s score for the
first variable (x)
Calculate the error between the mean and their score for the second
variable (y)
Multiply these error values
Add these values and you get the cross product deviations
The covariance is the average cross-product deviations:
P
(xi − x̄)(yi − ȳ )
cov (x, y ) =
N −1

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 115 / 394

Covariance

P
(xi − x̄)(yi − ȳ )
cov (x, y ) =
N −1
(−0.4)(−3) + (−1.4)(−2) + (−1.4)(−1) + (0.6)(2) + (2.6)(4)
=
4
1.2 + 2.8 + 1.4 + 1.2 + 10.4
=
4
17
=
4
= 4.25

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 116 / 394
Problems with covariance

It depends upon the units of measurement


E.g. the covariance of two variables measured in miles might be 4.25, but
if the same scores are converted to kilometres, the covariance is 11

One solution: standardize it!


Divide by the standard deviations of both variables

The standardized version of covariance is know as the correlation


coefficient
It is relatively unaffected by units of measurement

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 117 / 394

The Correlation Coefficient

covxy
r =
s s
Px y
(xi − x̄)(yi − ȳ )
=
(N − 1)sx sy

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 118 / 394
The Correlation Coefficient

covxy
r =
sx sy
4.25
=
1.67 · 2.92
= .87

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 119 / 394

Correlation: Example

Anxiety and exam performance


Participants: 103 students
Measures:
Time spent revising (hours)
Exam performance (%)
Exam Anxiety (the EAQ, score out of 100)
Gender

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 120 / 394
General Procedure for Correlations using R

To compute basic correlation coefficients there are three main


functions that can be used:
cor()
cor.test()
rcorr()

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 121 / 394

Pearson Correlation output

Exam Anxiety Revise


Exam 1.00000000 -0.4409934 0.39672070
Anxiety -0.4409934 1.00000000 -0.7092493
Revise 0.39672070 -0.7092493 1.00000000

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 122 / 394
Reporting the Results

Exam performance was significantly correlated with exam anxiety (r =


-0.44), and time spent revising (r = 0.40)
The time spent revising was also correlated with exam anxiety (r =
-0.71)
All p < .001

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 123 / 394

Things to Know about the Correlation

It varies between -1 and +1


0 = no relationship

It is an effect size
±0.1 small effect
±0.3 medium effect
±0.5 large effect

Coefficient of determination, r 2
By squaring the value of r you get the proportion of variance in one
variable shared by the other

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 124 / 394
Correlation and Causality

The third-variable problem


In any correlation, causality between two variables cannot be assumed
because there may be other measured or unmeasured variables affecting
the results

Direction of causality
Correlation coefficients say nothing about which variable causes the other
to change

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 125 / 394

Non-parametric Correlation

Spearman’s rho
Pearson’s correlation on the ranked data

Kendall’s tau
Better than Spearman’s for small samples

Example: World’s Biggest Liar competition


68 contestants
Measures
Where they were placed in the competition (first, second, . . . )
Creativity questionnaire (maximum score 60)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 126 / 394
Spearman’s Rho Output

Spearman’s rank correlation rho


data: liarData$Position and liarData$Creativity
S = 71948.4, p-value = 0.0008602
alternative hypothesis: true rho is less than 0
sample estimates:
rho
-0.3732184

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 127 / 394

Kendall’s Tau (Non-parametric)

The output is much the same as for the Spearman’s correlation

Kendall’s rank correlation tau


data: liarData$Position and liarData$Creativity
z = -3.2252, p-value = 0.0006294
alternative hypothesis: true tau is less than 0
sample estimates:
tau
-0.3002413

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 128 / 394
Partial and Semi-partial Correlations

Partial correlation
Measures the relationship between two variables, controlling for the effect
that a third variable has on them both

Semi-partial correlation
Measures the relationship between two variables controlling for the effect
that a third variable has on only one of the others

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 129 / 394

Partial and Semi-partial Correlations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 130 / 394
Partial and Semi-partial Correlations

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 131 / 394

Doing Partial Correlation using R

The general form of pcor() is:


1 pcor ( c ( " var1 " , " var2 " , " control1 " , " control2 " , etc .) , var ( dataframe ) )

We can then see the partial correlation and the value of R 2 in the
console by executing:
1 pc
2 pc ^2

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 132 / 394
Doing Partial Correlation using R

The general form of pcor.test() is:


1 pcor . test ( pcor object , number of control variables , sample size )

Basically, you enter an object that you have created with pcor() (or
you can put the pcor() command directly into the function):
1 pcor . test ( pc , 1 , 103)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 133 / 394

Chapter 7

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 134 / 394
Aims

Understand linear regression with one predictor


Understand how we assess the fit of a regression model
Total sum of squares
Model sum of squares
Residual sum of squares
F
R2
Know how to do regression using R
Interpret a regression model
Assessing the performance: MSE, Cross Validation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 135 / 394

What is Regression?

A way of predicting the value of one variable from another


It is a hypothetical model of the relationship between two variables
The model used is a linear one
We describe the relationship using the equation of a straight line

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 136 / 394
Describing a Straight Line

Yi = bo + bi Xi + i
b0
Intercept (value of Y when X = 0)
Point at which the regression line crosses the Y-axis (ordinate)
bi
Regression coefficient for the predictor
Gradient (slope) of the regression line
Direction/strength of relationship

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 137 / 394

The Method of Least Squares

This graph shows a scatterplot of some data with a line representing the general trend.
The vertical lines (dotted) represent the differences (or residuals)
between the line and the actual data
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 138 / 394
How good is the model?

The regression line is only a model based on the data


This model might not reflect reality
We need some way of testing how well the model fits the observed data
How?

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 139 / 394

The Method of Least Squares

Diagram showing from where the regression sums of squares derive


Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 140 / 394
Summary

SST
Total variability (variability between scores and the mean)

SSR
Residual/error variability
(variability between the regression model and the actual data)

SSM
Model variability
(difference in variability between the model and the mean)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 141 / 394

Testing the Model: ANOVA

if the model results in better prediction than using the mean,


then we expect SSM to be much greater than SSR

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 142 / 394
Testing the Model: ANOVA

Mean squared error


Sums of squares are total values
They can be expressed as averages
These are called mean squares, MS

MSM
F =
MSR

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 143 / 394

Testing the Model: R 2

R2
The proportion of variance accounted for by the regression model
The Pearson Correlation Coefficient Squared

SSM
R2 =
SST

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 144 / 394
Regression: An Example

A record company boss was interested in predicting record sales from


advertising

Data
200 Different album releases

Outcome variable
Sales (CDs and downloads) in the week after release

Predictor variable
The amount (in units of 1000$) spent promoting the record before release

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 145 / 394

Regression in R

We run a regression analysis using the lm() function – lm stand for


“Linear Model”
This function takes the general form:
1 newModel <- lm ( outcome ~ predictor ( s ) , data = dataFrame , na . action = an action ) )
2
3 albumSales .1 <- lm ( album1 $ sales ~ album1 $ adverts )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 146 / 394
Regression in R

We can tell R what dataframe to use (data = nameOfDataFrame):


1 albumSales .1 <- lm ( sales ~ adverts , data = album1 )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 147 / 394

Output of a Simple Regression

We have created an object called albumSales.1 that contains the


results of our analysis. We can show the object by executing:
1 summary ( albumSales .1)
2
3 > Coefficients :
4 Estimate Std . Error t value Pr ( >| t |)
5 ( Intercept ) 1.341 e +02 7.537 e +00 17.799 <2e -16 * * *
6 adverts 9.612 e -02 9.632 e -03 9.979 <2e -16 * * *
7
8 Signif . codes : 0 ’* * * ’ 0.001 ’* * ’ 0.01 ’* ’ 0.05 ’. ’ 0.1 ’ ’ 1
9
10 Residual standard error : 65.99 on 198 degrees of freedom
11 Multiple R - squared : 0.3346 , Adjusted R - squared : 0.3313
12 F - statistic : 99.59 on 1 and 198 DF , p - value : < 2.2 e -16

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 148 / 394
Using the Model

RecordSalesi = bo + bi · Advertising budgeti


= 134.14 + (0.09612 · Advertising budgeti )

RecordSalesi = 134.14 + (0.09612 · Advertising budgeti )


= 134.14 + (0.09612 · 100)
= 143.75

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 149 / 394

Multiple Regression

outcomei = (model) + errori Yi = (b0 + b1 X1i + b2 X2i + · · · + bn Xni ) + i

Example: airplay dataset


Album salesi = (b0 + b1 · Advertising budgeti + b2 · Airplayi ) + i
OLS
SST , SSR and SSM

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 150 / 394
Multiple regression fit

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 151 / 394

Complexity vs Precision: a trade-off

n is the number of points in your data sample


k is the number of independent regressors, i.e. the number of
variables in your model, excluding the constant

Adjusted R 2
n−1 n−2
R 2 = 1 − [( n−k−1 )( n−k−2 )( n+1 2
n )](1 − R )

Parsimony adjusted measures of fit


SSE
AIC = n · ln( ) + 2k
n

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 152 / 394
T-test on regression coefficients

bobserved − bexpected
t =
SEb
bobserved
=
SEb

t ∼ T (N − p − 1)
Confidence intervals for b
Hypothesis testing:
H0 : b = 0vs.H1 : b > 0, H1 : b < 0, H1 : b 6= 0

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 153 / 394

Methods of regression

Hierarchical
Known predictors (previous research) at first, new predictors afterwards
(forced entry, stepwise)

Forced entry
Model suggested from a theory

Stepwise
Forward (from null or intercept-only model)
Backward (from full or complete model)
Stepwise (like forward but admitting subsequent elimination)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 154 / 394
Checking assumptions
Variables types: All Xs (predictors) must be quantitative or
categorical, and Y (response variable) must be quantitative,
continuous and unbounded
Non-zero variance: The predictors should have some variation in value
No perfect multicollinearity
Predictors are uncorrelated with ’external variables’: there should be
no external variables that correlate with any of the Xs included in the
regression model. Obviously, if external variables do correlate with
some Xs, then the model become unreliable (because other variables
can predict Y just as well).
Homoscedasticity
Independent errors (No autocorrelation)
Normally distributed errors
Independence (among different values of Y)
Linearity
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 155 / 394

Checking assumptions

When the assumptions of regression are met, the model that we get
for a sample can be accurately applied to the population of interest
(the coefficients and parameters of the regression equation are said to
be unbiased )
What an unbiased model does tell us is that on average the
regression model from the sample is the same as the population model

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 156 / 394
Multicollinearity

Strong correlation between two or more predictors (Xs)


Perfect collinearity exists when at least one X is a perfect linear
combination of the others
It becomes impossible to obtain unique estimates of the regression
coefficients because there are an infinite number of combinations of
coefficients that would work equally well
perfect collinearity is rare in real-life data
less than perfect collinearity is virtually unavoidable
Consequences of collinearity:
Untrustworthy estimated betas
Limited effect size (R 2 )
Difficulties in understanding importance of predictors

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 157 / 394

Performance of the model

(yi − yˆi )2
P
Mean Square Error: MSE =
n
Training set / Test set (Training set is often called also “Validation”
set)
(K-fold, Leave-One-Out) Cross Validation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 158 / 394
K-fold Cross Validation

Since data are often scarce, there might not be enough to set aside
for a validation sample
To work around this issue k-fold CV works as follows:
1 Split the sample into k subsets of equal size
2 For each fold estimate a model on all the subsets except one
3 Use the left out subset to test the model, by calculating a CV metric of
choice
4 Average the CV metric across subsets to get the CV error
This has the advantage of using all data for estimating the model.
Common used K values are K=1, 5, 10. K=1 is a particular case
called LOOCV (Leave-One-Out CV)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 159 / 394

Chapter 8

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 160 / 394
Aims

When and why do we use logistic regression?


Binary
Multinomial
Theory behind logistic regression
Assessing the model
Assessing predictors
Things that can go wrong
Interpreting logistic regression

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 161 / 394

When and Why

To predict an outcome variable that is categorical from one or more


categorical or continuous predictor variables
Used because having a categorical outcome variable violates the
assumption of linearity in normal regression

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 162 / 394
Logistic with one predictor

1
P(Y ) =
1 + exp−(b0 +b1 X1 +)
Outcome
We predict the probability of the outcome occurring
b0 and b1
Can be thought of in much the same way as multiple regression
Note the normal regression equation forms part of the logis,c regression
equation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 163 / 394

Logistic with several predictors

1
P(Y ) =
1 + exp−(b0 +b1 X1 +b2 X2 +···+bn Xn +)
Outcome
We still predict the probability of the outcome occurring
Differences
Note the multiple regression equation forms part of the logistic
regression equation
This part of the equation expands to accommodate additional
predictors
expresses the multiple linear regression equation in logarithmic terms
(called the logit) and thus overcomes the problem of violating the
assumption of linearity
The resulting value from the equation varies between 0 and 1. A value
close to 0 means that Y is very unlikely to have occurred, and a value
close to 1 means that Y is very likely to have occurred.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 164 / 394
Assessing the Model

N
X
log −likelihood = [Yi ·ln(P(Yi ))+(1−Yi )·ln(1−P(Yi ))]
i=1

The log-likelihood statistic


Analogous to the residual sum of squares in multiple regression
It is an indicator of how much unexplained information there is after
the model has been fitted
Large values indicate poorly fitting statistical models
Estimated parameters b0 , b1 , . . . , bn , are those maximizing
log-likelihood

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 165 / 394

Assessing changes in models

It’s possible to calculate a log-likelihood for different models and to


compare these models by looking at the difference between their
log-likelihoods.

χ2 = 2[LL(new ) − LL(baseline)]with(df = knew − kbaseline )


The statistic χ2 has a chi-square distribution with df degrees of
freedom
The change in LL can be evaluated with a statistical hypothesis test:
H0 : χ2 = 0vsH1 : χ2 > 0

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 166 / 394
Assessing predictors: the Wald Statistic

b
Wald =
SEb
Similar to t-statistic in regression
The Wald statistic follows a normal distribution (a.k.a. z-statistic)
Test the null hypothesis that b=0
Is biased when b is large
Better to look at likelihood ration statistics (change in LL)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 167 / 394

Assessing predictors: the Odds Ratio

odds after a unit change in the predictor


Odds Ratio =
odds before a unit change in the predictor
Helps in model interpretation
Odds Ratio = e B orexp(B)
Indicates the change in odds resulting from a unit change in the
predictor:
OR > 1: Predictor ↑, probability of outcome occurring ↑
OR < 1: Predictor ↑, probability of outcome occurring ↓

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 168 / 394
Assessing predictors: the Odds Ratio

odds after a unit change in the predictor


Odds Ratio =
odds before a unit change in the predictor

P(event)
odds =
P(no event)
1
P(event Y ) =
1 + exp−(b0 +b1 X1 )
P(no eventY ) = 1 − P(event Y )
odds after a unit change in the predictor
∆odds =
original odds

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 169 / 394

Methods of Regression

Forced entry: all variables entered simultaneously


Hierarchical: variables entered in blocks
Blocks should be based on past research, or theory being tested. Good
method.
Stepwise: variables entered on the basis of statistical criteria (i.e.
relative contribution to predicting outcome)
Should be used only for exploratory analysis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 170 / 394
Things that can go wrong

Assumptions from linear regression:


Linearity
there is a linear relationship between any continuous predictors and the
logit of the outcome variable
Independence of errors
cases of data should not be related
Multicollinearity
this assumption can be checked with tolerance and VIF statistics (or
other measures)

Unique problems
Incomplete information
Complete separation
Overdispersion

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 171 / 394

Incomplete information

Do you smoke? Do you eat tomatoes? Do you have cancer?


Yes No Yes
Yes Yes Yes
No No Yes
No Yes ???

All possibilities should be accounted for in the data


Causes estimation method to be slow and unstable
This point applies not only to categorical variables, but also to
continuous ones
As a general point, whenever samples are broken down into categories
and one or more combinations are empty it creates problems.
These will probably be signalled by coefficients that have
unreasonably large standard errors.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 172 / 394
Incomplete information

Complete separation
the outcome variable can be perfectly predicted by one variable or a
combination of variables:
this problem often arises when too many variables are fitted to too
few cases
Often the only satisfactory solution is to collect more data
sometimes a neat answer is found by using a simpler model
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 173 / 394

Overdispersion

Overdispersion is where the variance is larger than expected from the


model
This can be caused by violating the assumption of independence
This problem makes the standard errors too small!

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 174 / 394
An example

predictors of a treatment intervention


participants: 113 adults with a medical problem
outcome: “cured (1)” or “not cured” (0)
predictors:
Intervention: intervention or no treatment
Duration: the number of days before treatment that the patient had
the problem

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 175 / 394

Logistic Regression Analysis using R

1 newModel <- glm ( outcome ~ predictor ( s ) , data = dataFrame ,


2 family = name of a distribution ,
3 na . action = an action )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 176 / 394
Logistic Regression Analysis using R

1 eelModel .1 <- glm ( Cured ~ Intervention , data = eelData ,


2 family = binomial () )
3 eelModel .2 <- glm ( Cured ~ Intervention + Duration , data = eelData ,
4 family = binomial () )
5
6 summary ( eelModel .1)
7 summary ( eelModel .2)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 177 / 394

Output Model1: Intervention only

1 Call :
2 glm ( formula = Cured ~ Intervention , family = binomial () , data = eelData )
3
4 Deviance Residuals :
5 Min 1Q Median 3Q Max
6 -1.5940 -1.0579 0.8118 0.8118 1.3018
7
8 Coefficients :
9 Estimate Std . Error z value Pr ( >| z |)
10 ( Intercept ) -0.2877 0.2700 -1.065 0.28671
11 I n t e r v e n t i o n I n t e r v e n t i o n 1.2287 0.3998 3.074 0.00212 * *
12
13 ( Dispersion parameter for binomial family taken to be 1)
14
15 Null deviance : 154.08 on 112 degrees of freedom
16 Residual deviance : 144.16 on 111 degrees of freedom
17 AIC : 148.16

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 178 / 394
Assessing the model: R

partial correlation between Y and each X


defined in [-1,+1]
R > 0 → increasing X causes an increase in P(Y=1)
R < 0 → increasing X causes a decrease in P(Y=1)
R ' 0 → X is not important within the model

s
z 2 − 2df
R=
−2LL(baseline)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 179 / 394

Assessing the model: R 2

proportional reduction in the absolute value of LL


a measure of how much the badness of fit improves as a result of the
inclusion of the predictor variables
can vary between 0 (indicating that the predictors are useless at
predicting the outcome variable) and 1 (indicating that the model
predicts the outcome variable perfectly)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 180 / 394
Assessing the model: R 2

Hosmer and Lemeshow’s


−2LL(model)
RL2 =
−2LL(baseline)
Cox and Snell’s
2 (−2LL(model) − (−2LL(baseline)))
RCS = 1 − exp( )
n
Negelkerke’s
2
RCS
RN2 =
−2LL(baseline)
1 − exp(− )
n
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 181 / 394

Assessing the model information criteria

AIC = −2LL + 2k

BIC = −2LL + 2k · log (n)


We want a measure of fit that we can use to compare two models
which penalizes a model that contains more predictor variables
You can think of this as the price you pay for something: you get a
be]er value of R 2 , but you pay a higher price, and was that higher
price worth it? These information criteria help you to decide.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 182 / 394
Writing a function to compute R 2

1 l o gi s t i c P se u d o R 2s <- func , on ( LogModel ) {


2 dev <- LogModel $ deviance
3 nullDev <- LogModel $ null . deviance
4 modelN <- length ( LogModel $ fitted . values )
5 R . l <- 1 - dev / nullDev
6 R . cs <- 1 - exp ( -( nullDev - dev ) / modelN )
7 R . n <- R . cs / ( 1 - ( exp ( -( nullDev / modelN ) ) ) )
8 cat ( " Pseudo R ^2 for logistic regression \ n " )
9 cat ( " Hosmer and Lemeshow R ^2 " , round ( R .l , 3) , " \ n " )
10 cat ( " Cox and Snell R ^2 " , round ( R . cs , 3) , " \ n " )
11 cat ( " Nagelkerke R ^2 " , round ( R .n , 3) , " \ n " )
12 }

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 183 / 394

Writing a function to compute R 2

To use the function on our model, we simply place the name of the
logistic regression model (in this case eelModel.1) in the function and
execute:
1 l o g is t ic P s eu d o R 2 s ( eelModel .1)

The output will be:


1 Pseudo R ^2 for logistic regression
2 Hosmer and Lemeshow R ^2 0.064
3 Cox and Snell R ^2 0.084
4 Nagelkerke R ^2 0.113

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 184 / 394
Calculating the Odds Ratio

We can also calculate the odds ra,o as the exponential of the b


coefficient for the predictor variables by executing:
1 exp ( eelModel .1 $ coefficients )
2
3 ( Intercept ) InterventionIntervention
4 0.750000 3.416667

To get the confidence intervals execute:


1 exp ( confint ( eelModel .1) )
2
3 2.5% 97.5%
4 ( Intercept ) 0.4374531 1.268674
5 InterventionIntervention 1.5820127 7.625545

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 185 / 394

Output Model2: Intervention and Duration as Predictors

1 Call :
2 glm ( formula = Cured ~ Intervention + Duration , family = binomial () , data = eelData )
3
4 Deviance Residuals :
5 Min 1Q Median 3Q Max
6 -1.6025 -1.0572 0.8107 0.8161 1.3095
7
8 Coefficients :
9 Estimate Std . Error z value Pr ( >| z |)
10 ( Intercept ) -0.234660 1.220563 -0.192 0.84754
11 I n t e r v e n t i o n I n t e r v e n t i o n 1.233532 0.414565 2.975 0.00293 * *
12 Duration -0.007835 0.175913 -0.045 0.96447
13
14 ( Dispersion parameter for binomial family taken to be 1)
15
16 Null deviance : 154.08 on 112 degrees of freedom
17 Residual deviance : 144.16 on 110 degrees of freedom
18 AIC : 150.16

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 186 / 394
Comparing the models

We can use the anova() function


1 anova ( eelModel .1 , eelModel .2)
2
3 > Analysis of Deviance Table
4
5 Model 1: Cured ~ Intervention
6 Model 2: Cured ~ Intervention + Duration
7 Resid . Df Resid . Dev Df Deviance
8 1 111 144.16
9 2 110 144.16 1 0.0019835

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 187 / 394

Casewise diagnostics

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 188 / 394
Summary

The overall fit of the final model is shown by the deviance statistic
and its associated chi-square statistic.
If the significance of the chi-square statistic is less than .05, then the
model is a significant fit to the data.
Check the table labelled coefficients to see which variables
significantly predict the outcome.
For each variable in the model, look at the z-statistic and its
significance (which again should be below .05).
Use the Odds Ratio for interpretation. You can obtain this using
exp(model$coefficients), where model is the name of your model.
If the value is greater than 1 then as the predictor increases, the odds
of the outcome occurring increase.
A value less than 1 indicates that as the predictor increases, the odds
of the outcome occurring decrease.
For the aforementioned interpretation to be reliable the confidence
interval of the Odds Ratio should not cross 1!

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 189 / 394

Multinomial Logistic Regression

Logistic regression to predict membership of more than two categories


It (basically) works in the same way as binary logistic regression
The analysis breaks the outcome variable down into a series of
comparisons between two categories. E.g., if you have three outcome
categories (A, B and C), then the analysis will consist of two
comparisons that you choose:
compare everything against your first category
(e.g. A vs. B and A vs. C),
or your last category (e.g. A vs. C and B vs. C),
or a custom category (e.g. B vs. A and B vs. C).
The important parts of the analysis and output are much the same as
we have just seen for binary logistic regression

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 190 / 394
Chapter 9: Comparing Two Means

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 191 / 394

Aims

t-tests:
Independent
Dependent (aka paired, matched)
Rationale for the tests
Assumptions
t-tests as a GLM
Interpretation
Calculating an effect size
Reporting results
Robust methods

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 192 / 394
Experiments

The simplest form of experiment that can be done is one with only
one independent variable that is manipulated in only two ways and
only one outcome is measured.
More often than not, the manipulation of the independent variable
involves having an experimental condition and a control
E.g., Is the movie Scream 2 scarier than the original Scream? We could
measure heart rates (which indicate anxiety) during both films and
compare them
This situation can be analysed with a t-test

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 193 / 394

Experiments

Independent t-test
Compares two means based on independent data.
E.g., data from different groups of people.
Dependent t-test
Compares two means based on related data.
E.g., Data from the same people measured at different times.
Data from “matched” samples.
Significance testing
Testing the significance of Pearson’s correlation coefficient
Testing the significance of b in regression.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 194 / 394
Rationale for the t-test

Two samples of data are collected and the sample means calculated.
These means might differ by either a little or a lot
If the samples come from the same population, then we expect their
means to be roughly equal. Although it is possible for their means to
differ by chance alone, we would expect large differences between
sample means to occur very infrequently

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 195 / 394

Rationale for the t-test


We compare the difference between the sample means that we collected to the
difference between the sample means that we would expect to obtain if there were
no effect (i.e. if the null hypothesis were true). We use the standard error as a
gauge of the variability between sample means. If the difference between the
samples we have collected is larger than what we would expect based on the
standard error then we can assume one of two things:
There is no effect and sample means in our population fluctuate a lot
and we have, by chance, collected two samples that are atypical of the
population from which they came
The two samples come from different populations but are typical of
their respective parent population. In this scenario, the difference
between samples represents a genuine difference between the samples
(and so the null hypothesis is incorrect)
As the observed difference between the sample means gets larger, the more
confident we become that the second explanation is correct (i.e. that the null
hypothesis should be rejected). If the null hypothesis is incorrect, then we gain
confidence that the two sample means differ because of the different experimental
manipulation imposed on each sample
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 196 / 394
Rationale for the t-test

A = observed difference between sample means


B = expected difference between population means
(if null hypothesis is true)
C = estimate of the standard error of the difference between two sample
means
A−B
t=
C

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 197 / 394

The t-test as a GLM

outcomei = (model) + errori

Ai = b0 + b1 Gi + i
anxietyi = b0 + b1 groupi + i

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 198 / 394
Picture Group

The group variable = 0


Intercept = mean of baseline group

X̄Picture = b0 + (b1 × 0)
b0 = X̄Picture
b0 = 40

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 199 / 394

Real Spider Group

The group variable = 1


b1 = Difference between means

X̄Real = b0 + (b1 × 1)
X̄Real = X̄Picture + b1
b1 = X̄Real − X̄Picture
= 47 − 40 = 7

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 200 / 394
Output from a Regression

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 201 / 394

Assumptions of the t-test

Both the independent t-test and the dependent t-test are parametric tests based
on the normal distribution. Therefore, they assume:
The sampling distribution is normally distributed. In the dependent
t-test this means that the sampling distribution of the differences
between scores should be normal, not the scores themselves.
Data are measured at least at the interval level.
The independent t-test, because it is used to test different groups of people, also
assumes:
Variances in these populations are roughly equal (homogeneity of
variance).
Scores in different treatment conditions are independent (because they
come from different people).

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 202 / 394
The Independent t-test

X̄1 − X̄2
t=s
sp2 sp2
+
n1 n2

(n1 − 1) · s12 + (n2 − 1) · s22


sp2 =
n1 + n2 − 2

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 203 / 394

Assumptions of the t-test

Is arachnophobia (fear of spiders) specific to real spiders or is a


picture enough?
Participants
24 arachnophobic individuals
Manipulation
12 participants were exposed to a real spider
12 were exposed to a picture of the same spider
Outcome
Anxiety

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 204 / 394
The Independent t-test Using R

To do a t-test we use the function t.test()


If you have the data for different groups stored in a single column:
1 newModel <- t . test ( outcome ~ predictor , data = dataFrame , paired = FALSE / TRUE )
2
3 ind . t . test <- t . test ( Anxiety ~ Group , data = spiderLong )

If you have the data for different groups stored in two columns:
1 newModel <- t . test ( scores group 1 , scores group 2 , paired = FALSE / TRUE )
2
3 ind . t . test <- t . test ( spiderWide $ real , spiderWide $ picture )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 205 / 394

Output from the Independent t-test

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 206 / 394
Calculating an Effect Size

s
t2
r =
t 2 + df
s
(−1.681)2
r =
(−1.681)2 + 22
r
2.826
=
24.826
= 0.34
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 207 / 394

Reporting the Results

On average, participants experienced greater anxiety from real spiders


(M = 47.00, SE = 3.18), than from pictures of spiders (M = 40.00,
SE = 2.68).
This difference was not significant, t(21.4) = −1.68, p ¿ .05;
however, it did represent a medium-sized effect, r = .34.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 208 / 394
The Dependent t-test

D̄ − µD
t= s √
D/ N

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 209 / 394

Example

Is arachnophobia (fear of spiders) specific to real spiders or is a


picture enough?
Participants
12 spider phobic individuals
Manipulation
Each participant was exposed to a real spider and a picture of the same
spider at two points in time
Outcome
Anxiety

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 210 / 394
The Dependent t-test Using R

To do a dependent t-test we again use the function t.test() but this


time include the option paired = TRUE.
If we have scores from different groups stored in different columns:
1 dep . t . test <- t . test ( spiderWide $ real , spiderWide $ picture , paired = TRUE )
2
3 dep . t . test

If we had our data stored in long format so that our group scores are
in a single column and group membership is expressed in a second
column:
1 dep . t . test <- t . test ( Anxiety ~ Group , data = spiderLong , paired = TRUE )
2 dep . t . test

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 211 / 394

Output from the Dependent t-test

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 212 / 394
Calculating the Effect Size

We can compute this value in the same way that we did for the
independent t-test by executing:
1 t <- dep . t . test $ statistic [[1]]
2
3 df <- dep . t . test $ parameter [[1]]
4
5 r <- sqrt ( t ^2 / ( t ^2+ df ) )
6
7 round (r , 3)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 213 / 394

Reporting the Results

On average, participants experienced significantly greater anxiety


from real spiders (M = 47.00, SE = 3.18) than from pictures of
spiders (M = 40.00, SE = 2.68), t(11) = 2.47, p ¡ .05, r = .60.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 214 / 394
When Assumptions are Broken

Dependent t-test
Mann–Whitney test
Wilcoxon rank-sum test
Independent t-test
Wilcoxon signed-rank test
Robust tests
Bootstrapping
Trimmed means

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 215 / 394

Robust Methods to Compare Independent Means

Regardless of whether your data come from the same or different


entities, these functions require the data to be in two different
columns (one for each experimental condition).

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 216 / 394
Robust Methods to Compare Independent Means

The first robust function, yuen(), is based on a trimmed mean:


1 yuen ( scores group 1 , scores group 2 , tr = .2 , alpha = .05)

We can also compare trimmed means but include bootstrap by using:


1 yuenbt ( scores group 1 , scores group 2 , tr = .2 , nboot = 599 , alpha = .05 , side = F )

A final method is to use bootstrap and an M-estimator (rather than


trimmed mean) by using the pb2gen() function:
1 pb2gen ( spiderWide $ real , spiderWide $ picture , alpha =.05 , nboot =2000 , est = mom )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 217 / 394

Output: Robust Methods to Compare Independent Means

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 218 / 394
Robust Methods to Compare Dependent Means

The first robust function, yuend(), is based on a trimmed mean:


1 yuend ( scores group 1 , scores group 2 , tr = .2 , alpha = .05)

We can also compare trimmed means but include bootstrap by using


ydbt():
1 ydbt ( scores group 1 , scores group 2 , tr = .2 , nboot = 599 , alpha = .05 , side = F )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 219 / 394

Output: Robust Methods to Compare Dependent Means

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 220 / 394
Robust Methods to Compare Dependent Means

A final method is to use bootstrap and an M-estimator (rather than


trimmed mean) by using the bootdpci() function. This function has
the general form:
1 bootdpci ( scores group 1 , scores group 2 , alpha =.05 , nboot =2000 , est = tmean )

For a bootstrap test of dependent M-estimators we execute:


1 results = bootdpci ( spiderWide $ real , spiderWide $ picture , est = tmean , nboot =2000)
2 results $ output
3 con . num psihat p . value p . crit ci . lower ci . upper
4 [1 ,] 1 7.5 0.037 0.05 0.5 13.125

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 221 / 394

Chapter 10: Comparing Several Means – ANOVA

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 222 / 394
Aims

Understand the basic principles of ANOVA


Why it is done?
What it tells us?
Theory of one-way independent ANOVA
Following up an ANOVA:
Planned contrasts/comparisons
Choosing contrasts
Coding contrasts
Post hoc tests

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 223 / 394

When and Why

When we want to compare means we can use a t-test. This test has
limitations:
You can compare only 2 means: often we would like to compare means
from 3 or more groups
It can be used only with one predictor/independent variable
ANOVA
Compares several means
Can be used when you have manipulated more than one independent
variable
It is an extension of regression (the general linear model)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 224 / 394
Why don’t use a lots of t-Tests?

If we want to compare several means why don’t we compare pairs of


means with t-tests?
Can’t look at several independent variables
Inflates the Type I error rate

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 225 / 394

What Does ANOVA tells Us?


Null hypothesis:
Like a t-test, ANOVA tests the null hypothesis that the means are the
same.
Experimental hypothesis:
The means differ.
ANOVA is an omnibus test
It test for an overall difference between groups.
It tells us that the group means are different.
It doesn’t tell us exactly which means differ.
How many comparisons can we do?

k!
C=
2(k − 2)!
5! 120
C= = = 10
2(5 − 2)! 2 · (3 · 2 · 1)
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 226 / 394
Example: Viagra dataset

Outcome: a measure of libido

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 227 / 394

ANOVA as Regression

outcomei = (model ) + errori


libidoi = b0 + b2highi + b1lowi + i

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 228 / 394
Placebo group

libidoi = = b0 + b2 highi + b1 lowi + i

libidoi = = b0 + (b2 · 0) + (b1 · 0)

libidoi = b0

b0 = X̄placebo

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 229 / 394

High dose group

libidoi = = b0 + b2 highi + b1 lowi + i

libidoi = = b0 + (b2 · 1) + (b1 · 0)

libidoi = b0 + b2

X̄high = X̄placebo + b2

b2 = X̄high − X̄placebo

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 230 / 394
Low dose group

libidoi = = b0 + b2 highi + b1 lowi + i

libidoi = = b0 + (b2 · 0) + (b1 · 1)

libidoi = b0 + b1

X̄low = X̄placebo + b1

b1 = X̄low − X̄placebo

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 231 / 394

Output from regression

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 232 / 394
Experiments vs. Correlation

ANOVA in regression:
Used to assess whether the regression model is good at predicting an
outcome.
ANOVA in experiments:
Used to see whether experimental manipulations lead to differences in
performance on an outcome (DV)
By manipulating a predictor variable can we cause (and therefore
predict) a change in behaviour?
Asking the same question, but in experiments we systematically
manipulate the predictor, in regression we don’t.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 233 / 394

Theory behind ANOVA

We calculate how much variability there is between scores: Total sum


of squares (SST )
We then calculate how much of this variability can be explained by
the model we fit to the data. . .
How much variability is due to the experimental manipulation, model
sum of squares (SSM )
. . . and how much cannot be explained
How much variability is due to individual differences in performance,
residual sum of squares (SSR )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 234 / 394
Rationale to Experiments

Variance created by our manipulation: Removal of brain (systematic


variance)
Variance created by unknown factors: E.g. differences in ability
(unsystematic variance)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 235 / 394

Rationale to Experiments

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 236 / 394
Theory behind ANOVA

We compare the amount of variability explained by the model


(experiment), to the error in the model (individual differences)
This ratio is called the F-ratio.
If the model explains a lot more variability than it can’t explain, then
the experimental manipulation has had a significant effect on the
outcome (DV).

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 237 / 394

Theory behind ANOVA

If the experiment is successful, then the model will explain more


variance than it can’t: SSM will be greater than SSR

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 238 / 394
ANOVA by hand

Testing the effects of Viagra on libido using three groups:


Placebo (sugar pill)
Low dose Viagra
High dose Viagra
The outcome/dependent variable (DV) was an objective measure of
libido.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 239 / 394

The Data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 240 / 394
The Data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 241 / 394

Total Sum of Squares (SST )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 242 / 394
Step 1: calculate SST

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 243 / 394

Degrees of Freedom

Degrees of freedom (df) are the number of values that are free to
vary.
Think about rugby teams!
In general, the df are one less than the number of values used to
calculate the SS.

dfT = N − 1 = 15 − 1 = 14

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 244 / 394
Model Sum of Squares SSM

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 245 / 394

Step 2: calculate SSM

X
SSM = ni (x̄i − x̄grand )2

SSM = = 5(2.2 − 3.467)2 + 5(3.2 − 3.467)2 + 5(5.0 − 3.467)2

= 5(−1.267)2 + 5(−0.267)2 + 5(1.533)2

= 8.025 + 0.355 + 11.755

= 20.135

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 246 / 394
Model Degrees of Freedom

How many values did we use to calculate SSM


We used the 3 means

dfM = k − 1 = 3 − 1 = 2

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 247 / 394

Residual Sum of Squares SSR

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 248 / 394
Step 3: calculate SSR

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 249 / 394

Step 3: calculate SSR

2 2 2
SSR = sgroup1 (n1 − 1) + sgroup2 (n2 − 1) + sgroup3 (n3 − 1)

SSR = = 1.70(5 − 1) + 1.70(5 − 1) + 1.70(5 − 1)

= 6.8 + 6.8 + 10

= 23.60

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 250 / 394
Residual Degrees of Freedom

How many values did we use to calculate SSR


We used the 5 scores for each of the SS for each group

dfR = dfgroup1 + dfgroup2 + dfgroup3

= (n1 − 1) + (n2 − 1) + (n3 − 1)

= (5 − 1) + (5 − 1) + (5 − 1)

= 12

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 251 / 394

Double Check

SST = SSM + SSR

43.74 = 20.14 + 23.60

43.74 = 43.74

dfT = dfM + dfR

14 = 2 + 12

14 = 14

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 252 / 394
Step 4: calculate the Mean Squared Error

Average amount of variation explained by the model (e.g., the systematic


variation)
SSM 20.135
MSM = = = 10.067
dfM 2
A gauge of the average amount of variation explained by extraneous
variables (the unsystematic variation)

SSR 23.60
MSR = = = 1.967
dfR 12

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 253 / 394

Step 5: calculate the F-Ratio

MSM
F =
MSR

MSM 10.067
F = = = 5.12
MSR 1.967

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 254 / 394
Step 6: Construct a Summary Table

Source SS df MS F
Model 20.14 2 10.067 5.12*
Residual 23.60 12 1.967
Total 43.74 14

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 255 / 394

ANOVA Assumptions
Homogeneity of variance:
The variances of the groups are supposed to be equal. This assumption
can be tested using Levene’s test (see section 5.7.1). If Levene’s test is
significant then we can say that the variances are significantly different.
This would mean that we had violated one of the assumptions of ANOVA
and we would have to take steps to rectify this matter. However, when
sample sizes are unequal, ANOVA is not robust to violations of
homogeneity of variance

Non-normality
When group sizes are equal the F-statistic can be quite robust to
violations of normality

Independence
when this assumption is broken (i.e., observations across groups are
correlated) then the Type I error rate is substantially inflated.
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 256 / 394
Why Use Follow-Up Tests?

The F-ratio tells us only that the experiment was successful (i.e.
group means were different)
It does not tell us specifically which group means differ from which.
We need additional tests to find out where the group differences lie.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 257 / 394

How?

Multiple t-tests
We saw earlier that this is a bad idea
Orthogonal contrasts/comparisons
Hypothesis driven
Planned a priori
Post hoc tests
Not planned (no hypothesis)
Compare all pairs of means
Trend analysis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 258 / 394
Planned Contrasts

Basic idea:
The variability explained by the model (experimental manipulation,
SSM ) is due to participants being assigned to different groups.
This variability can be broken down further to test specific hypotheses
about which groups might differ.
We break down the variance according to hypotheses made a priori
(before the experiment).
It’s like cutting up a cake (yum yum!)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 259 / 394

Rules When Choosing Contrasts

Independent
Contrasts must not interfere with each other (they must test unique
hypotheses).

Only two chunks


Each contrast should compare only two chunks of variation (why?).

K-1
You should always end up with one less contrast than the number of
groups.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 260 / 394
Generating Hypotheses

Example: Testing the effects of Viagra on libido using three groups:


Placebo (sugar pill)
Low dose Viagra
High dose Viagra
Dependent variable (DV) was an objective measure of libido.
Intuitively, what might we expect to happen?

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 261 / 394

Generating Hypotheses

Placebo Low Dose High Dose


3 5 7
2 2 4
1 4 5
1 2 3
4 3 6
Mean 2.20 3.20 5.00

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 262 / 394
How do I Choose Contrasts?

Big hint:
In most experiments we usually have one or more control groups.
The logic of control groups dictates that we expect them to be
different from groups that we’ve manipulated.
The first contrast will always be to compare any control groups (chunk
1) with any experimental conditions (chunk 2).

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 263 / 394

Hypotheses

Hypothesis 1:
People who take Viagra will have a higher libido than those who don’t.
placebo 6= (low, high)
Hypothesis 2:
People taking a high dose of Viagra will have a greater libido than
those taking a low dose.
low 6= high

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 264 / 394
Planned Comparisons

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 265 / 394

Another Example

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 266 / 394
Another Example

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 267 / 394

Defining contrasts using weights Coding Planned


Contrasts: Rules

Rule 1
Groups coded with positive weights compared to groups coded with
negative weights.

Rule 2
The sum of weights for a comparison should be zero.

Rule 3
If a group is not involved in a comparison, assign it a weight of zero.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 268 / 394
Defining contrasts using weights Coding Planned
Contrasts: Rules

Rule 4
For a given contrast, the weights assigned to the group(s) in one chunk of
variation should be equal to the number of groups in the opposite chunk
of variation.

Rule 5
If a group is singled out in a comparison, then that group should not be
used in any subsequent contrasts.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 269 / 394

Defining contrasts

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 270 / 394
Defining contrasts

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 271 / 394

Orthogonal contrasts for the Viagra data

libidoi = b0 + b1 contrast1i + b2 contrast2i


X high + X low + X placebo
b0 = grand mean =
3

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 272 / 394
Orthogonal contrasts for the Viagra data

libidoi = b0 + b1 contrast1i + b2 contrast2i

X high + X low + X placebo


X placebo = + (−2b1 ) + (b2 · 0)
3

X high + X low + X placebo


2b1 = − X placebo
3

6b1 = X high + X low + X placebo − 3X placebo

6b1 = X high + X low − 2X placebo

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 273 / 394

Orthogonal contrasts for the Viagra data

X high + X low + X placebo


2b1 = − X placebo
3
6b1 = X high + X low + X placebo − 3X placebo
6b1 = X high + X low − 2X placebo

Effect of experimental group vs control group:

X high + X low
3b1 = − X placebo
2
  
1 X high + X low
b1 = − X placebo
3 2

X high + X low
3b1 = − X placebo
2
5 + 3.2
= − 2.2
2
= 1.9
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 274 / 394
Orthogonal contrasts for the Viagra data

libidoi = b0 + b1 contrast1i + b2 contrast2i


X high = b0 + (b1 · 1) + (b2 · 1)
b2 = X high − b1 − b0

b2 = X high − b1 − b0
     
1 X high + X low X high + X low + X placebo
b2 = X high − − X placebo −
3 2 3
  
X high + X low 
3b2 = 3X high − − X placebo − X high + X low + X placebo
2
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 275 / 394

Orthogonal contrasts for the Viagra data

libidoi = b0 + b1 contrast1i + b2 contrast2i


 
6b2 = 6X high − X high + X low − X placebo − 2 X high + X low + X placebo
= 6X high − X high − X low + 2X placebo − 2X high − 2X low − 2X placebo
= 3X high − 3X low
Difference between experimental groups:

1
b2 = (X high − X low ) (1)
2
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 276 / 394
Orthogonal contrasts for the Viagra data: output

F-statistic unchanged
The intercept is the “grand mean”
The regression coefficient for contrast1 is one-third of the difference
between the average of the experimental conditions and the control
condition
The regression coefficient for contrast2 is half of the difference
between the experimental groups experimental groups were
significantly different from the control (p < .05) but that the
experimental groups were not significantly different (p > .05)
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 277 / 394

Non-Orthogonal contrasts for the Viagra data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 278 / 394
Standard contrasts

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 279 / 394

One-Way ANOVA using R


When the Test assumptions are met

Using lm():
1 viagraModel <- lm ( libido ~ dose , data = viagraData )

Using aov():
1 viagraModel <- aov ( libido ~ dose , data = viagraData )
2
3 summary ( viagraModel )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 280 / 394
Output from aov()

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 281 / 394

Plot of the model

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 282 / 394
When variances are not equal across groups

If Levene’s test is significant then it is reasonable to assume that


population variances are different across groups.
We can get the output for Welch’s F for the current data by
executing:
1 oneway . test ( libido ~ dose , data = viagraData )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 283 / 394

Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 284 / 394
Robust ANOVA

Require the data to be in wide format rather than the long format
We can reformat the data using unstack():
1 viagraWide <- unstack ( viagraData , libido ~ dose )

This command creates a new dataframe called viagraWide, which is


our Viagra data but in wide format, so each column represents a
different group
Here it is possible to find an interactive example:
https://rdrr.io/r/utils/stack.html

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 285 / 394

viagraWide

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 286 / 394
Robust ANOVA

For an ANOVA of the Viagra data based on 20% trimmed means:


1 t1way ( viagraWide )

To compare medians rather than means:


1 med1way ( viagraWide )

To add a bootstrap to the trimmed mean method:


1 t1waybt ( viagraWide )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 287 / 394

Robust outputs

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 288 / 394
Planned Contrasts using R

To do planned comparisons in R we have to set the contrast attribute


of our grouping variable using the contrast() function and then
recreate our ANOVA model using aov(). By default, dummy coding is
used.
We can see this if we summarise our existing viagraModel using the
summary.lm() function rather than summary():
1 summary . lm ( viagraModel )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 289 / 394

Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 290 / 394
Polynomial Contrasts: Trend Analysis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 291 / 394

Trend Analysis

Follow the general procedure of setting the contrast attribute of the


predictor variable:
1 contrasts ( viagraData $ dose ) <- contr . poly (3)

We then create a new model using aov():


1 viagraTrend <- aov ( libido ~ dose , data = viagraData )

To access the contrasts:


1 summary . lm ( viagraTrend )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 292 / 394
Trend Analysis: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 293 / 394

Trend Analysis: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 294 / 394
Post Hoc Tests

Compare each mean against all others (pairwise comparisons)


No specific “a priori” predictions about the data and interest in
exploring the data for any between-group differences between means
that exist
In general terms they use a stricter criterion to accept an effect as
significant
Hence, pairwise comparisons control the familywise Type I error by
correcting the level of significance for each test such that the overall
Type I error rate (a) across all comparisons remains at .05
Simplest example is the Bonferroni method:

α
Bonferroniα =
number of tests

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 295 / 394

Post-hoc tests (Superhero dataset)

Outcome: severity of injury (0 – 100)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 296 / 394
Post Hoc Tests Recommendations

How you conduct post hoc tests in R depends on which test you’d
like to do.
Bonferroni and related methods such as the Holm and
Benjamini–Hochberg (BH) variants are done using the
pairwise.t.test() function, which is part of the R base system.
However, Tukey and Dunnett’s test can be done using the glht()
function in the multcomp package.
Finally, Wilcox (2005) has some robust methods implemented in his
functions lincon() and mcpp20().

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 297 / 394

Bonferroni and BH post hoc tests

1 pairwise . t . test ( viagraData $ libido , viagraData $ dose , p . adjust . method = " bonferroni " )
2
3 pairwise . t . test ( viagraData $ libido , viagraData $ dose , p . adjust . method = " BH " )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 298 / 394
Tukey

For the Viagra data, we can obtain Tukey post hoc tests by executing:
1 postHocs <- glht ( viagraModel , linfct = mcp ( dose = " Tukey " ) )
2 summary ( postHocs )
3 confint ( postHocs )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 299 / 394

Tukey post hoc test output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 300 / 394
Robust post hoc tests

1 lincon ( viagraWide )
2 mcppb20 ( viagraWide )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 301 / 394

Effect size of ANOVA


r
SSM
A simple measure of effect size is: r = R2 =
SST
r is slightly biased because it is based purely on sums of squares from
the sample and no adjustment is made for the fact that we’re trying
to estimate the effect size in the population. Therefore,
omega-squared is often used instead:

SSM − (dfM )MSR


ω2 =
SST + MSR
values of .01, .06 and .14 represent small, medium and large effects
respectively

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 302 / 394
Chapter 15: Non-parametric Tests

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 303 / 394

Aims

When and why do we use non-parametric tests?


Wilcoxon rank-sum test
Wilcoxon signed-rank test
Kruskal–Wallis test
Jonckheere–Terpstra test
Friedman’s ANOVA
Ranking data
Interpretation of results
Reporting results
Calculating an effect size

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 304 / 394
When to use Non-parametric Tests

Non-parametric tests are used when assumptions of parametric tests


are not met.
It is not always possible to correct for problems with the distribution
of a data set
In these cases we have to use non-parametric tests
They make fewer assumptions about the type of data on which they
can be used

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 305 / 394

The Wilcoxon rank-sum test

The non-parametric equivalent of the independent t-test.


Use to test differences between two conditions in which different
participants have been used

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 306 / 394
Ranking Data

The test works on the principle of ranking the data for each group:
Lowest score = a rank of 1
Next highest score = a rank of 2, and so on.
Tied ranks are given the same rank: the average of the potential ranks
For an unequal group size
The test statistic (Ws) = sum of ranks in the group that contains the
least people
For an equal group size
Ws = the value of the smaller summed rank
Add up the ranks for the two groups and take the lowest of these
sums to be our test statistic
The analysis is carried out on the ranks rather than the actual data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 307 / 394

An Example

A neurologist investigated the depressant effects of certain


recreational drugs.
Tested 20 clubbers.
10 were given an ecstasy tablet to take on a Saturday night.
10 were allowed to drink only alcohol.
Levels of depression were measured using the Beck Depression
Inventory (BDI) the day after and midweek.
Rank the data ignoring the group to which a person belonged.
A similar number of high and low ranks in each group suggests
depression levels do not differ between the groups.
A greater number of high ranks in the ecstasy group than the alcohol
group suggests the ecstasy group is more depressed than the alcohol
group.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 308 / 394
Ranking the Depression Scores for Wednesday and Sunday

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 309 / 394

Provisional Analysis

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 310 / 394
Running the Analysis Using R Commander

The Nonparametric Tests menu in R Commander and the dialog box for the Wilcoxon
test for independent samples
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 311 / 394

Running the Analysis Using R

If you have the data for different groups stored in a single column:
1 newModel <- wilcox . test ( outcome ~ predictor , data = dataFrame , paired = FALSE / TRUE )

However, if you have the data for different groups stored in two
columns:
1 newModel <- wilcox . test ( scores group 1 , scores group 2 , paired = FALSE / TRUE )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 312 / 394
Running the Analysis Using R

To compute a basic Wilcoxon test for our Sunday data we could


execute:
1 sunModel <- wilcox . test ( sundayBDI ~ drug , data = drugData )
2 sunModel

For the Wednesday data:


1 wedModel <- wilcox . test ( wedsBDI ~ drug , data = drugData )
2 wedModel

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 313 / 394

Output from the Wilcoxon Rank-Sum Test

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 314 / 394
Reporting the results

Depression levels in ecstasy users (Mdn = 17.50) did not differ


significantly from alcohol users (Mdn = 16.00) the day after the
drugs were taken, W = 35.5, p = .286.
However, by Wednesday, ecstasy users (Mdn = 33.50) were
significantly more depressed than alcohol users (Mdn = 7.50),
W = 4, p < .001.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 315 / 394

Comparing Two Related Conditions: the Wilcoxon


Signed-Rank Test

Uses:
To compare two sets of scores, when these scores come from the same
participants.
Imagine the experimenter in the previous example was interested in
the change in depression levels for each of the two drugs.
We still have to use a non-parametric test because the distributions of
scores for both drugs were non-normal on one of the two days

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 316 / 394
Ranking Data in the Wilcoxon

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 317 / 394

Running the Analysis Using R Commander

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 318 / 394
Running the Analysis Using R Commander

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 319 / 394

Running the Analysis Using R

We want to run our analysis on the alcohol and ecstasy groups


separately; therefore, our first job is to split the dataframe into two
using the subset() function:
1 alcoholData <- subset ( drugData , drug == " Alcohol " )
2 ecstasyData <- subset ( drugData , drug == " Ecstacy " )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 320 / 394
Running the Analysis Using R

To run the analysis for the alcohol group execute:


1 alcoholModel <- wilcox . test ( alcoholData $ wedsBDI , alcoholData $ sundayBDI , paired = TRUE ,
correct = FALSE )
2 alcoholModel

and for the ecstasy group:


1 ecstasyModel <- wilcox . test ( ecstasyData $ wedsBDI , ecstasyData $ sundayBDI , paired = TRUE ,
correct = FALSE )
2 ecstasyModel

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 321 / 394

Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 322 / 394
Reporting the results

For ecstasy users, depression levels were significantly higher on


Wednesday (Mdn = 33.50) than on Sunday (Mdn = 17.50), p = .047
However, for alcohol users the opposite was true: depression levels
were significantly lower on Wednesday (Mdn = 7.50) than on Sunday
(Mdn = 16.0), p = .012

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 323 / 394

Differences between Several Independent Groups:


the Kruskal–Wallis test

The Kruskal–Wallis test (Kruskal & Wallis, 1952) is the


non-parametric counterpart of the one-way independent ANOVA
If you have data that have violated an assumption then this test can be
a useful way around the problem
The theory for the Kruskal–Wallis test is very similar to that of the
Wilcoxon rank-sum test:
The Kruskal–Wallis test is based on ranked data
The sum of ranks for each group is denoted by Ri (where i is used to
denote the particular group)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 324 / 394
Kruskal–Wallis Theory

Once the sum of ranks has been calculated for each group, the test
statistic, H, is calculated as:

X R2 k
12 i
H= − 3(N + 1)
N(N − 1) ni
i=1

Ri is the sum of ranks for each group


N is the total sample size (in this case 80)
ni is the sample size of a particular group (in this case we have equal
sample sizes and they are all 20).

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 325 / 394

Real data example

Does eating soya cause fertility problems? Does that affect the sperm
count?
Variables:
Outcome: sperm (millions)
IV: Number of soya meals per week:
No Soya meals
1 Soya meal
4 soya meals
7 soya meals
Participants: 80 males (20 in each group)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 326 / 394
Data for the Soya Example with Ranks

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 327 / 394

Provisional Analysis

Run some exploratory analyses on the data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 328 / 394
Doing the Kruskal–Wallis Test using R Commander

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 329 / 394

Doing the Kruskal–Wallis Test using R

For the current data:


1 kruskal . test ( Sperm ~ Soya , data = soyaData )

To interpret the Kruskal–Wallis test, it is useful to obtain the mean


rank for each group:
1 soyaData $ Ranks <- rank ( soyaData $ Sperm )

This command creates a variable Ranks in soyaData dataframe that


is the ranks for the variable Sperm. We can then obtain the mean
rank for each group:
1 by ( soyaData $ Ranks , soyaData $ Soya , mean )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 330 / 394
Output from the Kruskal–Wallis test

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 331 / 394

Boxplot for the Sperm Counts of Individuals


Eating Different Numbers of Soya Meals per Week

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 332 / 394
Post Hoc Tests for the Kruskal–Wallis Test

1 kruskalmc ( Sperm ~ Soya , data = soyaData )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 333 / 394

Post Hoc Tests for the Kruskal–Wallis Test

One of the problems with comparing every group against all others is
that have to be quite strict about accepting a difference as significant
otherwise we will inflate the Type I error rate. To reduce this problem
we could use more focussed comparisons.
In this example, we have a control group that had no soya meals. As
such, a nice succinct set of comparisons would be to compare each
group against the control:
Test 1: one soya meal per week compared to no soya meals
Test 2: four soya meal per week compared to no soya meals
Test 3: seven soya meal per week compared to no soya meals
Yo compare each group to the no-soya group (using a two-tailed test)
we simply execute:
1 kruskalmc ( Sperm ~ Soya , data = soyaData , cont = ’two - tailed ’)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 334 / 394
Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 335 / 394

Testing for Trends: the Jonckheere–Terpstra Test

This statistic tests for an ordered pattern to the medians of the


groups you’re comparing.
Essentially it does the same thing as the Kruskal–Wallis test but it
incorporates information about whether the order of the groups is
meaningful.
Use this test when you expect the groups you’re comparing to
produce a meaningful order of medians.
In the current example we expect that the more soya a person eats,
the more their sperm count will go down.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 336 / 394
Jonckheere–Terpstra Test Using R

We can conduct a Jonckheere test by executing:


1 jonckheere . test ( soyaData $ Sperm , as . numeric ( soyaData $ Soya ) )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 337 / 394

Differences between several related Groups:


Friedman’s ANOVA

Used for testing differences between conditions when:


there are more than two conditions
the same participants have been used in all conditions (each case
contributes several scores to the data)
If you have violated some assumption of parametric tests then this
test can be a useful way around the problem

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 338 / 394
Theory of Friedman’s ANOVA

The theory for Friedman’s ANOVA is much the same as the other
tests: it is based on ranked data.
Once the sum of ranks has been calculated for each group, the test
statistic, Fr , is calculated as:

k
" #
12 X
Fr = Ri2 − 3N(k + 1)
Nk(k + 1)
i=1

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 339 / 394

Example

Does the Andikins diet work (low carb diet)?


Variables:
Outcome: weight (kg)
IV: time since beginning the diet:
Baseline
1 month
2 months
Participants: 10 women

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 340 / 394
Diet Data

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 341 / 394

Friedman’s ANOVA Using R Commander

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 342 / 394
Friedman’s ANOVA Using R

To run the Friedman test we simply input the name of our dataframe,
but within the as.matrix() function, which converts it to a matrix.
In this example, we would execute:
1 friedman . test ( as . matrix ( dietData ) )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 343 / 394

Output from Friedman’s ANOVA

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 344 / 394
Post Hoc Tests for Friedman’s ANOVA

For the current data we would execute:


1 friedmanmc ( as . matrix ( dietData ) )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 345 / 394

To Sum Up. . .
When data violate the assumptions of parametric tests we can
sometimes find a non-parametric equivalent
Usually based on analysing the ranked data

Wilcoxon rank-sum test


Compares two independent groups of scores

Wilcoxon signed-rank test


Compares two dependent groups of scores

Kruskal–Wallis test
Compares more than two independent groups of scores

Friedman’s test
Compares more than two dependent groups of scores

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 346 / 394
Chapter 17: Principal Components Analysis and Reliability

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 347 / 394

Aims

What Are factors?


Representing factors
Graphs and equations
Extracting factors
Methods and criteria
Interpreting factor structures
Factor rotation
Reliability
Cronbach’s alpha

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 348 / 394
When and Why?

To test for clusters of variables or measures


To see whether different measures are tapping aspects of a common
dimension. For e.g.:
anal-retentiveness (a person who pays such attention to detail that it
becomes an obsession and may be an annoyance to others)
number of friends
social skills
All of those might be aspects of the common dimension of “statistical
ability”

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 349 / 394

R-Matrix

In factor analysis we look to reduce the R-matrix into smaller set of


uncorrelated dimensions

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 350 / 394
What is a Factor?

If several variables correlate highly, they might measure aspects of a


common underlying dimension.
These dimensions are called factors.
Factors are classification axis along which the measures can be
plotted.
The greater the loading of variables on a factor, the more that factor
explains relationships between those variables.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 351 / 394

Graphical Representation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 352 / 394
Mathematical Representation

Y = b1 X1 + b2 X2 + · · · + bn Xn
Factori = b1 Variable1 + b2 Variable2 + · · · + bn Variablen

Y = b1 X1 + b2 X2 + · · · + bn Xn
Sociability = b1 Talk1 + b2 Social Skills + b3 Interest +
+b4 Talk2 + b5 Selfishb6 Liar
Consideration = b1 Talk1 + b2 Social Skills + b3 Interest +
+b4 Talk2 + b5 Selfishb6 Liar

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 353 / 394

Factor Loadings

The b-values in the equation represent the weights of a variable on a


factor
These values are the same as the coordinates on a factor plot
They are called factor loadings
These values are stored in a factor pattern matrix (A)

 
0.87 0.01
 0.96 −0.03
 
 0.92 0.04 
A=
 0.00

 0.82 

−0.10 0.75
0.09 0.70

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 354 / 394
The R anxiety questionnaire (RAQ)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 355 / 394

Initial Considerations

The quality of analysis depends upon the quality of the data


(GI → GO)
Test variables should correlate quite well
r > .3
Avoid multicollinearity:
several variables highly correlated, r > .80
Avoid singularity:
some variables perfectly correlated, r = 1
Screen the correlation matrix, eliminate any variables that obviously
cause concern

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 356 / 394
Further Considerations

Determinant:
indicator of multicollinearity
should be greater than 0.00001
Kaiser–Meyer–Olkin (KMO):
measures sampling adequacy
should be greater than .5
Bartlett’s test of sphericity:
yeasts whether the R-matrix is an identity matrix
should be significant at p < .05
Anti-image matrix:
measures of sampling adequacy on diagonal,
off-diagonal elements should be small
Reproduced:
correlation matrix after rotation
most residuals should be < |0.05|

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 357 / 394

Finding Factors: Communality

Common variance:
Variance that a variable shares with other variables
Unique variance:
Variance that is unique to a particular variable
The proportion of common variance in a variable is called the
communality
communality = 1, all variance shared
communality = 0, no variance shared
0 < communality < 1 = some variance shared

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 358 / 394
Graphical Representation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 359 / 394

Graphical Representation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 360 / 394
Graphical Representation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 361 / 394

Graphical Representation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 362 / 394
Graphical Representation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 363 / 394

Finding Factors

We find factors by calculating the amount of common variance


Circularity
Principal components analysis:
Assume all variance is shared
All communalities = 1
Factor analysis
Estimate communality
Use squared multiple correlation (SMC)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 364 / 394
Initial Preparation and Analysis

We want to include all of the variables in our data set in our factor
analysis.
We can calculate the correlation matrix:
1 raqMatrix <- cor ( raqData )
2 round ( raqMatrix , 2)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 365 / 394

The R-matrix (or correlation matrix)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 366 / 394
Factors Extraction

Kaiser’s criterion
Kaiser (1960): retain factors with eigenvalues > 1
Scree plot
Cattell (1966): use “point of inflexion” of the scree plot
Which rule?
Use Kaiser’s criterion when
less than 30 variables, communalities after extraction > .7
sample size > 250 and mean communality ≥ .6
Scree plot is good if sample size is > 200.

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 367 / 394

Factor extraction using R

By extracting as many factors as there are variables we can inspect


their eigenvalues and make decisions about which factors to extract.
To create this model we execute one of these commands:
1 pc1 <- principal ( raqData , nfactors = 23 , rotate = " none " )
2 pc1
3
4 pc1 <- principal ( raqMatrix , nfactors = 23 , rotate = " none " )
5 pc1

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 368 / 394
Principal Components Model

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 369 / 394

Principal Components Model

Examples of scree plots for data that probably have two underlying
factors
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 370 / 394
The Scree Plot for the RAQ Data

Scree plot from principal components analysis of RAQ data


The second plot shows the point of inflexion at the fourth component.
Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 371 / 394

Principal Components Model

Now that we know how many components we want to extract, we can


rerun the analysis, specifying that number:
1 pc2 <- principal ( raqData , nfactors = 4 , rotate = " none " )
2 pc2 <- principal ( raqMatrix , nfactors = 4 , rotate = " none " )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 372 / 394
Principal Components Model: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 373 / 394

Residuals

Check the residuals and make sure that fewer than 50% have absolute
values greater than 0.05, and that the model fit is greater than 0.90.
Execute the function below:
1 residual . stats <- function ( matrix ) {
2 residuals <- as . matrix ( matrix [ upper . tri ( matrix ) ])
3 large . resid <- abs ( residuals ) > 0.05
4 num b er La r g eR esi d <- sum ( large . resid )
5 propL argeResid <- n um b er L ar g e R e si d s / nrow ( residuals )
6 rmsr <- sqrt ( mean ( residuals ^2) )
7
8 cat ( " Root means squared residual = " , rmsr , " \ n " )
9 cat ( " Number of absolute residuals > 0.05 = " , numberLargeResids , " \ n " )
10 cat ( " Proportion of absolute residuals > 0.05 = " , propLargeResid , " \ n " )
11 hist ( residuals )
12 }

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 374 / 394
Residuals

Having executed the function, we could use it on our residual matrix:


1 resids <- factor . residuals ( raqMatrix , pc2 $ loadings )
2 residual . stats ( resids )

Or:
1 residual . stats ( factor . residuals ( raqMatrix , pc2 $ loadings ) )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 375 / 394

Residuals

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 376 / 394
Rotation

To aid interpretation it is possible to maximize the loading of a


variable on one factor while minimizing its loading on all other factors
This is known as factor rotation.
There are two types:
orthogonal (factors are uncorrelated)
oblique (factors intercorrelate)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 377 / 394

Rotation

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 378 / 394
Orthogonal Rotation (varimax)

To carry out a varimax rotation, we change the rotate option in the


principal() function from “none” to “varimax” (we could also exclude
it altogether because varimax is the default if the option is not
specified):
1 pc3 <- principal ( raqData , nfactors = 4 , rotate = " varimax " )
2 pc3 <- principal ( raqMatrix , nfactors = 4 , rotate = " varimax " )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 379 / 394

Orthogonal Rotation (varimax)

Interpreting the factor loading matrix is a little complex; we can make


it easier by using the print.psych() function.
Generally you should be very careful with the cut-off value – if you
think that a loading of .4 will be interesting, you should use a lower
cut-off (say, .3), because you don’t want to miss a loading that was
.39:
1 print . psych ( pc3 , cut = 0.3 , sort = TRUE )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 380 / 394
Orthogonal Rotation (varimax)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 381 / 394

Oblique Rotation (oblimin)

The command for an oblique rotation is very similar to that for an


orthogonal rotation – we just change the rotate option from
“varimax” to “oblimin”.
1 pc4 <- principal ( raqData , nfactors = 4 , rotate = " oblimin " )
2 pc4 <- principal ( raqMatrix , nfactors = 4 , rotate = " oblimin " )

As with the previous model, we can look at the factor loadings from
this model in a nice easy-to-digest format by executing:
1 print . psych ( pc4 , cut = 0.3 , sort = TRUE )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 382 / 394
Oblique Rotation (oblimin)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 383 / 394

Important!

We assume that algebraic factors represent psychological constructs.


The nature of these psychological dimensions is “guessed at” by
looking at the loadings for a factor.
This assumption is controvertible.
Many argue that factors are statistical truths only – and psychological
fictions

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 384 / 394
Reliability

Test–retest method
What about practice effects/mood states?
Alternate form method
Expensive and impractical
Split-half method
Splits the questionnaire into two random halves, calculates scores and
correlates them
Cronbach’s alpha
Splits the questionnaire into all possible halves, calculates the scores,
correlates them and averages the correlation for all splits (well, sort of
...)
Ranges from 0 (no reliability) to 1 (complete reliability)

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 385 / 394

Cronbach’s Alpha

 
var1 cov12 cov13
variance–covariance matrix = cov12 var2 cov23 
cov13 cov23 var3

N 2X covariance
α = Pn 2
Pn
item=1 sitem + item=1 covitem

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 386 / 394
Interpreting Cronbach’s Alpha

Kline (1999)
Reliable if α > .7
Depends on the number of items
More questions = bigger α
Treat subscales separately
Remember to reverse score reverse phrased items!
If not, α is reduced and can even be negative

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 387 / 394

Reliability Analysis Using R

Subscale 1 (Fear of computers): items 6, 7, 10, 13, 14, 15, 18


Subscale 2 (Fear of statistics): items 1, 3, 4, 5, 12, 16, 20, 21
Subscale 3 (Fear of mathematics): items 8, 11, 17
Subscale 4 (Peer evaluation): items 2, 9, 19, 22, 23
First, we’ll create four new data sets, containing the subscales for the
items:
1 computerFear <- raqData [ , c (6 , 7 , 10 , 13 , 14 , 15 , 18) ]
2 statisticsFe ar <- raqData [ , c (1 , 3 , 4 , 5 , 12 , 16 , 20 , 21) ]
3 mathFear <- raqData [ , c (8 , 11 , 17) ]
4 peerEvaluati on <- raqData [ , c (2 , 9 , 19 , 22 , 23) ]

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 388 / 394
Reliability Analysis Using R

To use the alpha() function we simply input the name of the


dataframe for each subscale, and, where necessary, include the keys
option:
1 alpha ( computerFear )
2 alpha ( statisticsFear , keys = c (1 , -1 , 1 , 1 , 1 , 1 , 1 , 1) )
3 alpha ( mathFear )
4 alpha ( peerEv aluatio n )

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 389 / 394

Reliability Analysis Using R: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 390 / 394
Reliability Analysis Using R: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 391 / 394

Reliability Analysis Using R: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 392 / 394
Reliability Analysis Using R: Output

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 393 / 394

The End?

Describe factor structure/reliability


What items should be retained?
What items did you eliminate and why?
Application
Where will your questionnaire be used?
How does it fit in with psychological theory?

Dr. M. Romano (University of Cagliari) Quantitative Methods for Management A.A. 2021/2022 394 / 394

You might also like