2022 Lecture4 Part1

Logistic regression + comparing
groups
Pre-Master course Business Research Methods

Dr. Kristin Kronenberg
Agenda
• Comparing two means
• Chi-square
• Logistic regression
Agenda
• Comparing two means
• Chi-square
• Logistic regression
Why compare two means?
• Often used for experimental data
• Looking at differences
• Different entities
- Participants who received actual medication vs. those who
received a placebo
• Same or related entities
- Students' knowledge before and after this lecture
How to compare two means?
• Different entities
- Independent t-test (independent-measures t-test;
independent-means t-test)
• Same or related entities
- Paired-samples t-test (dependent t-test)
• Comparing differences between the means of two
groups means predicting an outcome based on
membership of two groups
• We can use the linear model with a dichotomous
predictor (also known as dummy variable)
- Yes or no
- Treatment or no treatment
- Lecture or not
- Cloak or no cloak
- 0 or 1
• The t-test tells us whether the difference between
means is different from zero (= something is going on!)
• Best predicted value of the outcome is the group
mean (summary statistic with the least squared error)
• You're asking yourself: do rabbits eat more carrots than
other animals?
Categorical predictors in the linear model
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋1𝑖 + 𝜀𝑖
Carrots𝑖 = 𝑏0 + 𝑏1 Rabbit 𝑖 + 𝜀𝑖
• Group variable = 0 (no rabbit)
• b0 = mean of baseline (no rabbit) group = intercept
Carrots𝑖 = 𝑏0 + 𝑏1 Rabbit 𝑖
𝑋𝑁𝑜𝑅𝑎𝑏𝑏𝑖𝑡 = 𝑏0 + 𝑏1 ∗ 0
𝑏0 = 𝑋𝑁𝑜𝑅𝑎𝑏𝑏𝑖𝑡
• Group variable = 1 (Rabbit)
• 𝑏1 = difference between group means
Carrots𝑖 = 𝑏0 + 𝑏1 Rabbit 𝑖
𝑋𝑅𝑎𝑏𝑏𝑖𝑡 = 𝑏0 + 𝑏1 ∗ 1
𝑋𝑅𝑎𝑏𝑏𝑖𝑡 = 𝑏0 + 𝑏1
𝑋𝑅𝑎𝑏𝑏𝑖𝑡 = 𝑋𝑁𝑜𝑅𝑎𝑏𝑏𝑖𝑡 + 𝑏1
𝑏1 = 𝑋𝑅𝑎𝑏𝑏𝑖𝑡 − 𝑋𝑁𝑜𝑅𝑎𝑏𝑏𝑖𝑡
The logic behind the t-test
• If samples come from the same population, we expect large differences
between sample means to occur very infrequently
• Under H0, we expect means from two random samples to be very similar
• We compare the difference between the sample means that we collected to
the difference between the sample means that we would expect to obtain
(in the long run) if there was no effect
• If the difference between the samples we have collected is larger than we
would expect (based on the standard error), then one of two things has
happened
- There is no effect, but sample means from our population fluctuate a lot and we
happen to have collected two samples that produce very different means
- The two samples come from different populations, which is why they have
different means and this difference indicates an actual difference between the
samples, and H0 is unlikely
• Two samples with two means, which differ by a little or a lot
• Compare difference between sample means we obtained to
expected sample means if there was no effect ( = other animals
eat as many carrots as rabbits)
• Signal-to-noise ratio: (systematic) variance explained by the
model divided by (unsystematic) variance the model cannot
explain
• How large is the observed difference between the sample
means (relative to the standard error)?
• The larger it is (relative to the standard error), the more likely it
is that the two means differ due to different conditions
Model (Signal)
Error (Noise)
Independent t-test: Example
• Are invisible people mischievous?
• Experiment
- Participants placed in enclosed community full of hidden
cameras
- 12 participants with invisibility cloak
- 12 participants without invisibility cloak
• How many mischievous acts did participants perform
in a week?
What does a suitable dataset look like?
The independent t-test in SPSS
Mean difference = 3.75 - 5.00 = -1.25

Standard error of the sampling distribution of differences = 0.73
−𝟏.𝟐𝟓
t-statistic = 𝟎.𝟕𝟑𝟎 = −𝟏. 𝟕𝟏𝟑
The probability to obtain this value or larger if 𝑯𝟎 was true is 0.101 (10.1%)
We do not reject 𝑯𝟎 and assume that the cloak does not affect the amount of
mischief
Paired-samples t-test: Example
• Are invisible people mischievous?
• Experiment with 12 participants
- Participants placed in enclosed community full of hidden
cameras
- No cloak in week 1
- Invisibility cloak in week 2
• How many mischievous acts did participants perform
in weeks 1 and 2?
The paired-samples t-test in SPSS
Mean difference = 3.75 - 5.00 = -1.25

Standard error of difference scores = 0.329
−𝟏.𝟐𝟓
t-statistic = 𝟎.𝟑𝟐𝟗 = −𝟑. 𝟖𝟎𝟒
The probability to obtain this value or larger if 𝑯𝟎 was true is 0.003 (0.3%)
We reject 𝑯𝟎 and assume that the cloak does affect the amount of mischief
Assumptions
• The t-test is a special case of the linear model, so the previously
discussed assumptions apply
• Both t-tests are parametric tests based on the normal distribution.
Therefore, they assume that
- Data are measured at least at the interval level
- The sampling distribution is normally distributed. In the dependent t-test
this means that the sampling distribution of the differences between
scores should be normal, not the scores themselves
• The independent t-test, as it is used to test different groups of
entities, also assumes that
- Variances in these populations are equal (homogeneity of variance)
- Scores in different treatment conditions are independent (since they
come from different entities)
Reporting
• Independent samples
- On average, participants given a cloak of invisibility engaged
in more acts of mischief (M = 5, SE = 0.48) than those without
a cloak (M = 3.75, SE = 0.55). This difference (-1.25) was not
significant, t(21.54) = -1.71, p = 0.101
• Dependent samples
- On average, participants given a cloak of invisibility engaged
in more acts of mischief (M = 5, SE = 0.48) than those without
a cloak (M = 3.75, SE = 0.55). This difference (-1.25) was not
significant, t(11) = -3.80, p = 0.003

2022 Lecture4 Part1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 Lecture4 Part1

Uploaded by

Copyright:

Available Formats

Logistic regression + comparing

Pre-Master course Business Research Methods

Mean difference = 3.75 - 5.00 = -1.25

Mean difference = 3.75 - 5.00 = -1.25

You might also like