Professional Documents
Culture Documents
o Median: midpoint
Order them from lowest to highest and find the midpoint
o Mode: most frequent
Variance:
- (Root word is varied)
- Reflects how scores differ or vary from the mean
- Also called “spread” or “dispersion”
- We may know the average or mean score of a series of values. But what of the spread or
dispersion, or variability of those scores?
Measures of Variance:
1. Range
a. Formula for Range:
Range = highest score – the lowest score
2. Variance
a. Is an indication of how far individual scores are from the mean ON AVERAGE.
b. Or, another way of describing variance is the average distance a set of the score
is from the mean
S2 = Variance
Sigma = sum of
X = individual score
X bar = mean
N = sample size
3. Standard Deviation (SD)
3
a. We square the differences to get rid of the negative values that arise from
subtraction
b. We then take the square root of the entire values and return them to the
original units we started with
Sigma = sum of
N = sample size
SD = Standard Deviation
Ri = The return observed in one period
R avg = the mean average
Visualizing Data:
1. Frequency distribution
Is a method of tallying and representing the number of times a certain score occurs
Usually group scores into interval classes/rages
2. Class intervals
This is a range of numbers.
E.g age range from 10-19; 20-29; 30-39, etc.
So the ages of 14, 18, 19 would go into the first interval range of 10-19 years. The
total number (frequency) of ages in that interval would be 3.
This is the first step in the creation of frequency distribution.
3. Histograms
You can create a histogram for income, and anything
Uses the curves (the bell curve etc.)
4. Distribution patterns
o Distribution can be different in three ways
o Variability
o Skewness
4
Frequency distribution:
Step one: Determine the range
Step 2: Decide on the number of class intervals (usually 10-20)
Step 3: Decide on the size of class intervals (related to the # of class intervals)
Step 4: decide the starting point for the first class
Step 5: create the class intervals
Step 6: put the data into the class intervals
Reliability:
Concepts and Measures
- Social science research is often concerned with important concepts, such as “freedom”
and “equality”; but these concepts can be vague before we define them clearly.
- There is a need to “conceptualize” and “operationalize” a concept to measure it and use
the measured data (variable) in empirical studies
- Concept –> Conceptual Definition –> Operational Definition –> Variable
1. Clarify the empirical properties and form an unambiguous definition
2. Devise a strategy for measurements (this has implications for the reliability and validity
of the measure itself…)
3. Record the measurement
4. Makes sense of and infers from the measures (statistics)
Measurement in the context of statistics
- Confidence in findings (we desire this!)
- The tools we use to measure
o Surveys
o Interviews
o Psychometric tests
o Physiological assessments
o And even the people who acquire the data
- It goes: Research Question(s) -> Methods and Measures -> statistics -> Results ->
Confidence in the resultant findings and conclusions
- NOTE: measures are only part of the methods
1. The measures we have to sue are:
a. Retain some consistency in measuring what it is supposed to measure
(Reliability)
b. Actually, measure what they are supposed to measure (Validity)
Reliability – some theory
- Reliability of a test refers to the stability, precision, consistency, or repeatability of
measurement
5
- Definition:
o Consistency of the measure
- Observed score – an obtained score that comprises a person’s true score and an error
score.
- True score – the part of the observed score that represents the individual’s real score
and does not contain measurement error.
- Error score – the part of the observed score that is attributed to measurement error.
Measurement error
- Systematic errors – predictable errors of measurement
o Occur in one direction – under or over-estimating
o Can correct it by re-calibration the instrument or adding/subtracting the
constant
- Random Errors – due to chance and unpredictably affect a subject’s score from trial to
trial
1. The participant – many factors: mood, motivation, fatigue, health, fluctuations in
memory and performance, previous practice, specific knowledge, familiarity with
test items
2. The testing – lack of clarity or completeness in directions, how rigidly the directions
are followed, motivation applied, supplementary directions, etc.
3. The scoring – competence, experience, the dedication of the scorers and nature of
scoring itself, the familiarity that the scorer has with the behavior being tested
4. Instrumentation – inaccuracy, lack of calibration
Estimates of reliability measures
- We can quantify reliability by using a type of coefficient that measures the degree of
association or strength of the relationship between two sets of data
Types of reliability
1. Interrater reliability (also called the “test-re-test”)
2. Interrater reliability
3. Parallel (alternate) form
4. Internal consistency
Validity:
- Validity is a trait (characteristic or property) of the test or instrument
- Types of instruments
o Lab. instruments (HR monitors, VO2 max)
o Survey (questionnaires)
o Psychological instruments (motivations) – scales
o The research itself (qualitative research)
- Similar to the reliability, r values that are 0.70 or higher are considered thresholds for
acceptable validity
- Definition:
o The measure is measuring what it is intended to
Common types of validity
1. Content validity A ty
6
a. When you want to know whether a sample of items truly reflects an entire
universe of items on a certain topic.
b. Degree to which the content or subject matter of a test represents the
content or subject matter of the course to which it applies.
c. Hot to establish …
i. Content expert
ii. Do items represent all possible items?
iii. How well does the number of items reflect what was taught?
2. Criterion validity
a. Criterion: measures of criterion-based validity are established when results
of one test are compared to results obtained with an accepted standard or
CRITERION
b. Predictive criterion validity
i. How consistent a test outcome is with a criterion that occurs in the
future
ii. The use of a measurement instrument for prediction or estimation of
some future event using the present test.
iii. In this situation, the prediction is the criterion, and the present test is
the one to be validated
c. Concurrent criterion validity
i. How well a test outcome is consistent with an established or other
criteria (present, not future).
ii. Two measures of the same variable are obtained by two different
instruments in a close period
iii. One measure is a criterion measure, and the other might be a new
test or instrument to be validated
3. Construct validity A y
a. Construct validation is used when the variable of interest has no definitive
criterion, is too difficult to measure, or cannot be directly observed;
constructed to measure an abstract trait.
b. Constructs are often multidimensional (e.g., health, personality, intelligence,
happiness, depression, sportsmanship).
Summary:
- Measures that are valid increase confidence in statistical conclusions
- Validity means that the instrument measures what it says it measures (range: 0.00-1.00)
- Three aspects of validity that concern the researcher are:
o Content
o Criterion (concurrent and predictive)
o Construct (hardest one to ascertain)
- Measures should be valid and reliable – this is the definition of ‘accuracy’
½ = 50% so there is a 50% chance that the coin will land heads up
- Discrete random variable: a countable set of distinct possible values (e.g., dice, city of
birth, etc.).
- Continuous probability: any value (to any number of decimal places) within some
interval is a possible value (e.g., height, weight, age, etc.).
- Sampling continuous variables results in a ‘spread of scores’ that resembles a curve,
instead of clear, discrete scores.
- In other words, if a score on the normal curve lies 1.5 SD above the mean, the z-score
would be 1.5.
- If a score on the normal curve lies -2.0 SDs below the mean, the z-score would be -2.0.
- Note that z-scores can be positive or negative depending on whether or not they are
above the mean (pos +) or below the mean (neg -).
10
EXAMPLE:
Calculate the z-score if you had a raw score of 100, a mean of 120, and an SD of 10.
ANSWER
z = (100-120)
10
z = -20
10
11
z = -2
Statistical significance:
- Researchers assess more than one variable
- Researchers assess either difference between variables or associations between
variables.
- Researchers are interested in whether or not these differences or relationships are
STATISTICALLY SIGNIFICANT.
o What then, does statistically significant mean?
- Statistical significance: Any difference between groups that is not due to chance but
rather due to a systematic influence.
o In other words, we consider a difference (or association) between two variables
to be statistically significant if the difference is due to some systematic influence
(e.g., intervention).
o The degree of risk you are willing to take that you will reject that things
happened by chance - when they did.
- Researchers set parameters to reduce the chance of making a mistake BEFORE
conducting the research (a priori).
*This is standard or level of risk is called the level of significance ( or
alpha).
o A level of significance can be viewed as the probability of rejecting that chance
was the cause (re: null hypothesis) of the result when it was.
- Conventional levels of significance are set between .01 and .05.
o If we set the level of significance at .05, this means:
There is a 5% probability that the difference/association in measured
values between the two populations studied is due to chance and not
due to intervention or treatment.
- Journal articles express it this way: “the findings were statistically significant at the 0.05
or 5% level.”
o Statistical significance is defined by a probability or p-value.
This probability is denoted as p, expressed as p <0.05, p<0.01, or p<0.001
in journal articles.
- Where does this calculated p-value come from?
o SPSS and other statistical computer packages complete this calculation for the
researcher.
o E.g., A t-test, which measures differences between two means may present a
calculated p-value of 0.009.
If we had set our level of significance at 0.05, we see that 0.009 < 0.05.
What this means is that the probability that the difference between these
12
two means happened by chance is 0.9%, much less than our threshold of
5%.
Statistical significance and the null hypothesis:
- When the calculated p-value (e.g., p=0.024) is less than our set threshold for the level of
significance (e.g., 0.05), we say that there is a statistically significant difference
between the two means.
o If there is a statistically significant difference, we are also saying that we reject
the Null Hypothesis.
In so doing, we are accepting some other explanation other than chance
that explains the differences we see.
- Conversely when the calculated p-value (e.g., p=0.32) is greater than our set threshold
for the level of significance (e.g., 0.05), we say that there is no statistically significant
difference between the two means.
- If there is no statistically significant difference, we are also saying that it is likely that the
differences observed happened by chance, and that the Null Hypothesis prevails.
o In so doing, we are accepting that chance explains the differences we see.
1. Our calculated probability level is <0.05
a. REJECT the null hypothesis – believe that there are important differences
between the two populations. “The findings are statistically significant.”
2. Our calculated probability level is >0.05
a. FAIL TO REJECT the null hypothesis – still don’t know whether there are
important differences between the two populations or not. Our probability level
is >0.05. “There is no statistical significance with the findings.”
In A, we see that the mean of the second patient group (with disease) falls on the left
side of the 5% level of significance we have established. What this means is that there is a
13
greater than 5% chance that the mean of the disease patients happened by chance and not due
to the disease. This is not an acceptable probability for us and so we would fail to reject the null
hypothesis and state that there is no statistical difference between the means of the patients
with and without the disease.
In B however, what we see is the mean from the patients without the disease group to the right
of the 5% level of significance we have established previously. This means that there is less than
a 5% chance that the mean of this variable occurred by chance and the difference between the
means is most likely due to the disease process. In this case, we would reject the null
hypothesis and state that there is a significant difference between the means.
We can take steps to decrease the probability of committing one or the other, or both:
1. Robust sampling (including sample size considerations).
2. Methods used in the study (internal validity) – e.g., validity and reliability of the
instruments, personnel.
3. Levels of probability selected a priori (or before the experiment is run).
Summary:
- Statistics work to disprove the null hypothesis; this leaves the research hypothesis as a
reasonable alternative.
- We have to set a level of risk to reject the null hypothesis (conventionally this is 5%, or
0.05).
- We see this expressed in journals as (p<0.05).
- p<0.05 means that there is less than a 5% chance that the results we see happened by
chance; most likely they happened because of something else.
- When we see this, we state that there is a ‘statistically significant’ difference or
association between the variables being investigated.
- A Type 1 error is when the researcher rejects the null hypothesis when it was true.
o In other words, the researcher concluded that there was a difference/association
between the two means, but in reality, there was not.
- A Type 2 error is when the researcher fails to reject (or accept) the null hypothesis when
it was false.
o In other words, the researcher concluded that there was no
difference/association between the two means, but in reality, there was.
(The comparison between the population normal curve and a sample normal curve)
15
(the normal curve is visible in terms of its proportions, but the sample mean has been replaced
by the population mean, and the standard deviations (SD) have been replaced by the standard
error [SE])
- The mean is referred to as the population mean
16
Why SE is important:
- We don’t want our sample mean to fall too far from the population mean.
- The Standard Error (SE) lets us know how our sample means are likely to deviate from
the population mean.
o Therefore, the smaller the SE, the closer the sample mean is to the population
mean.
- The formula for the standard error of the mean is:
SD
SE=
√n
SD = standard deviation
Denominator = square root of the sample size
- If the SD increases, there will be an increase in the SE.
- If the SD decreases, this will decrease the SE.
Example:
Given a sample of 100 (n=100) and an SD of 20
SE of the mean:
20/√100 = 20/10 = 2
Say we had another sample from the same population but with an n=100 and SD of 40.
SE of the mean:
40/√100 = 40/10 = 4
- If there is more variability in the sample, the SE increases. This makes it harder to draw a
sample that is representative of the population. Given wide variability, we will need a
larger sample size.
- If the sample size is small, this will cause an increase in the SE.
17
- Conversely if the sample size is very large, then the SE will decrease.
- With a larger sample size, we have less error. If there is less error, we can estimate more
precisely the parameters of the population.
- Amount of variance in population has implications for sample size… (i.e., a small amount
of variance in population, can have a smaller sample and still have small SE)
What does this tell us about the relationship between sample size, sample SD, and the SE?
- If we want our sample to represent the population, researchers have to consider the
sample size but also the SD.
o Generally, if the SD of a sample is large, the researcher may have to compensate
with a larger sample size to reduce the SE.
o This has implications down the road for statistical significance.
How confident can we be that the population mean falls close to our
sample mean?
- An estimate of a population parameter given as a single number is called a point
estimate.
o The sample mean is a point estimate.
- A confidence interval (CI) is a range or interval of values that surround the point
estimate.
o Point estimates (sample means) and CI estimates are types of statistical
estimates that allow researchers to infer the true value of an unknown
population parameter using information from a random sample from that
population.
In other words… a confidence interval includes upper and lower limits,
calculated using the sample mean (point estimate), wherein, after
repeated sampling, 95% or 99% of the time, the population mean would
fall into this interval.
EXAMPLE:
You draw a sample (n=50) from a population. The mean is 40 and the SD = 15.
1. Calculate the 95% CI for your sample.
2. Explain what you have calculated.
95% CI = 40 ± (1.96)(2.12)
= 40 ± 4.15
= 35.85 – 44.15
EXAMPLED EXPLAINED:
- You draw a sample (n=50) from a population. The mean is 40 and the SD = 15. Calculate
the CI for your sample. Explain what you have calculated.
o 95% CI = 35.85 – 44.15
Can also be written as:
95% CI = 35.85 ≤ μ ≤ 44.15
Explanation: We have constructed a confidence interval wherein we are confident that the
true population mean will fall within this interval 95/100 times of repeated sampling.
- Remember: μ = population mean
o The tighter the range, the greater confidence we can have that the population
mean is close to the sample mean
95% CI = 35.85 – 44.15
Another way of interpreting this is the following (long version):
• One sample (n=50) we pull from the population will have a mean of 40 with 95%
confidence limits between 35.85 and 44.15 (holding constant sampling error…)
• Another sample (n=50) we draw could have a mean of 42 with 95% CI between 37.85
and 46.15.
• The 95% CI indicates that if we were to draw 100 random samples, each with n=50, we
could construct 100 CIs around the sample means, 95 of which could be expected to
contain the true population mean.
SUMMARY:
• We want our sample to reflect the population as closely as possible.
• The standard error of the mean (SE) lets us know how close our sample mean is to the
population mean: the smaller the SE, the more closely our sample mean is to the
population mean.
• Both the sample size and the standard deviation of that sample will influence the SE.
• We can also examine confidence intervals to give us some confidence about our sample.
• Confidence intervals are based on the SE and when calculated, will tell us how likely the
population mean will fall within a certain range (either 95 or 99% of the time).
• Taken together, both the SE and confidence intervals should be considered to give us
‘added confidence’ that a sample mean reflects the population.
RUNNING T-TESTS THEORY
https://www.investopedia.com/terms/t/t-test.asp
20
1. T-tests
a. EXAMPLES
i. Psychology – some form of cognitive-behavioral therapy on levels of
anxiety (test differences in anxiety levels).
ii. Sociology – household income on food security (test differences in food
security).
iii. Human kinetics – creatine phosphate supplementation on maximum
bench press strength (test differences in strength).
b. There are two groups: (their questions)
i. Are the groups different?
ii. How meaningful are the differences that may occur?
c. Independent T-Test
i. Assesses differences between two independent groups.
d. Dependent T-Test
i. Assesses difference between two dependent groups (for example; a test-
retest situation with one group being tested twice)
Example: wants to measure anxiety levels, and wants to start an anxiety help group
Group one: first-year students
Group two: fourth-year students
Independent t-test:
- Testing group one and group two with the same test
Dependent t-test:
- One group, went to an anxiety group, and then was re-tested.
OVERVIEW:
• We can’t use z-scores to assess differences between two sample means because the z-
distribution (or normal curve) assumes we know the population distribution (mean and
variance).
• In the real world this rarely is known and we don’t have the resources to assess this, we
have to rely on samples to provide estimates.
• The researcher needs to estimate the sampling distribution and the associated standard
error (i.e., the standard deviation of the sampling distribution) for a given sample size.
FROM Z TO T LOGIC:
SUMMARY OF EXPLANATION:
- The larger our sample, the more faith we have that its mean represents the population
mean and the smaller the correction required.
- These corrections have been calculated:
o Student's t-distribution (Salkind Table B.2 pp. 357-358)
- After running a t-test, we can then use the p-value and compare this to values in the
Student’s t-distribution table to determine whether the results are statistically
significant.
o To do so, we will need to determine:
Degrees of freedom = (n1 – n2) - 2
p-value (<.05, <.01, <.001)
RUNNING T-TEST IN SPSS & DIRECTIONAL AND NON-DIRECTIONAL T-TEST:
T-test to assess significant differences:
23
t=X –X
1 2
s –
x1
• If the difference between means is large and the variability is small, this ratio will be
large.
• A large value of this ratio will be a strong indication that there is a real difference
between the means.
• If the difference between means is small and the variability is large, this ratio will be
small and we must conclude that the difference between means is not significant.
• *Therefore the size of this ratio is our measure of the significance of the difference
between the means.
• This t-ratio is the TCALC
Computing the test statistics (independent samples)
• Numerator is the difference
between the means.
• Denominator is the amount of
variation within and between each
of the two groups (called ‘pooled
variance’).
Computing the test statistics (dependent samples)
• ∑d = sum of all the differences
between the scores
• ∑d2 = sum of the differences squared
between groups of scores
• n = number of pairs of observations
t (58) = -.14, p > .05
- t represents the test statistic used
- 58 is the number of degrees of freedom
- -.14 is the obtained or calculated value (from the formula)
- p > .05 indicates the probability
DIRECTIONAL hypothesis: in the case of a directional hypothesis, the researcher will select a
‘one-tailed’ t-test when using t-tables.
Used when there is a director with the research hypothesis (i.e., the researcher has a
hunch which mean score will be larger).
- In a directional hypothesis the researcher hypothesizes that the predicted mean of one
group will be larger than the other. In this case, the sign must be in the predicted
direction (+) for the alternative hypothesis to be accepted.
o In other words, the t-ratio will be positive or negative depending on the direction
of the hypothesis.
o When we compare the calculated value of t to the critical value of t we DO NOT
consider the (+) or (-) sign, but only consider the absolute value of calculated t.
Research hypothesis: Grade 7 boys’ ears (n=10) will be larger than Grade
7 girls’ ears (n=10).
Ho = there will be no difference between the size of Grade 7 boys’
ears and Grade 7 girls’ ears.
In a one-tailed test, the calculated ratio must also be of the
appropriate sign (+) or (-); this is to say that the predicted
difference in the direction must be supported.
In this case, we anticipate that the calculated t-value will be
positive (+) – because we hypothesize that boys’ ears > girl’s ears.
o Critical t value for a level of significance of 0.05 for a one-
tailed test using 18 degrees of freedom:
1 = 0.05 t(18) = 1.734 from table
o Calculated t-ratio in our study = 2.319
o 2.319 > 1.734 therefore we reject the Ho
o We can conclude statistically that Grade 7 boys’ ears are
larger than Grade 7 girls’ ears.
NON-DIRECTIONAL hypothesis: in the case of a non-directional hypothesis, the researcher will
select a ‘two-tailed’ t-test when using t-tables.
- Used when there is no direction with the research hypothesis (i.e., the researcher
doesn’t know which group will have a larger mean).
o Therefore, the sign of t doesn’t matter (can be + or -).
- Research hypothesis: there will be a difference in the size of Grade 7 boys’ ears
compared to Grade 7 girls’ ears (non-directional – we aren’t hypothesizing larger or
smaller, just different).
o Ho = there will be no difference between the size of Grade 7 boys’ ears and
Grade 7 girls’ ears.
27
- A t-ratio for a level of significance of 0.05 for a two-tailed test using 18 degrees of
freedom:
o 1 = 0.05 t(18) = 2.101 from table
- Calculated t-ratio in our study = 2.319.
- 2.319 > 2.101 therefore we reject the Ho.
- We can conclude statistically that there is a difference in size between Grade 7 boys’
and Grade 7 girls’ ears, but we don’t know which group is larger (until we look at the
means).
One-way ANOVA:
1. ‘Partition out’ or ‘account for’ the noise (variance) between individual scores and their
group mean.
2. ‘Partition out’ or ‘account for’ the noise (variance) between the group means and what
we could call a ‘grand mean (mean of all the scores from all the participants).
3. ‘Partition out’ or ‘account for’ the noise (variance) between the individual scores and
the ‘grand mean’:
a. Whatever variance is left over, if large enough, will be the signal that says
something else other than random variance caused the differences in the means.
EXAMPLE:
• Let’s say you wanted to develop a drug that elevated the mood of stats students
(because we all know that stats can affect our mood!)
• Mood Enhancer (ME).
• There are 30 students in the class, randomly allocate them into the three different
groups
• Group 1: control (no drugs)
• Group 2: ME1 (low level of ME drug)
• Group 3: ME2 (high level of Me drug)
• You hypothesize that the higher level of ME used, the greater the elevated mood:
• ME2 > ME1 > Control
• Your null hypothesis is that there will be no differences in mood across all three
Groups, and you set your level of significance at 0.05.
• There is a mood scale (1 (lousy)-10 (amazing))
Example detour:
Within-group variances SSWithin
- Textbooks describe this type of variance as naturally inherent or “random variation”
that exists among all individuals (this is why we never expect to get the same mean if we
draw two samples from a population).
- This variation can be caused by random error or perhaps issues related to
measurement.
- But even if our sampling is robust and our measures are valid/reliable, the ‘random’ or
‘inherent’ variation still exists.
The mood of Stat Students (N=3)
30
F-RATIO:
- If the treatment (ME drug) has no effect, then the added or ‘extra’ variation will be zero
and the ratio should be equal to 1.0. This is called an F-ratio.
- A small F-ratio means that there will be no treatment effect of added or extra variance
and therefore not likely the cause of the differences between the means.
- If however, this ratio is much greater than 1.0, it will be because the treatment
variation, due to treatment alone, is large:
- Just like the t-ratio we saw with t-tests, ANOVA gives an F-ratio that can be described as
the “Mean Square Between Groups” variance divided by the “Mean Square Within
Subjects” variance:
- F-ratio = MSBETWEEN GROUPS
MSWITHIN GROUPS
o *The ‘Mean’ in ‘Mean Square’ means the average of the ‘between groups’ or
‘within groups’ variance
32
- Sum of squares (SS): variance calculated by subtracting the mean from each score in a
distribution of scores, squaring this difference, then adding all of the differences to
produce a total (or sum) of the squared deviations from the mean (SS).
- Mean square (MS): the mean or average variation. Calculated by dividing the total sum
of squares (SS) by the degrees of freedom.
• e.g., comparing gas mileage across different brands of cars and in different
provinces
• When we are examining the effect of two independent variables, this is called a Two-
Way ANOVA.
• An independent variable is something by association or by influence that can affect
(change) a dependent variable (the variable of interest).
• For example, studying for an exam is an independent variable, as studying can
influence the change in exam scores (i.e., lower levels of studying = lower exam
scores, higher levels of studying = higher exam scores).
• In ANOVA, the independent variables are called ‘FACTORS.’
• In a Two-way ANOVA, the effects of two factors can be investigated simultaneously.
• Two-way ANOVA permits the investigation of the effects of either factor alone (e.g., the
effect of the brand of car on the gas mileage, and the effect of tire type on the gas
mileage)
• BUT Two-way ANOVA also allows the investigation of the two factors together (e.g., the
combined effect of the model of the car AND the effect of state on gas mileage).
• E.g., Is there a difference in gas mileage when we combine both brands of car
AND type of tire used?
• This ability to look at both factors together is the advantage of a Two-Way ANOVA
compared to using two One-Way ANOVA’s (one for each factor) independently.
• The effect on the dependent variable that can be attributed to the levels of either factor
alone is called the MAIN EFFECT.
o Main Effect is what you would detect using two separate one-way ANOVA’s.
• But with Two-Way ANOVA you can detect both main effects separately (car brand and
tire type) as well as a combined effect (car x tire) called an INTERACTION EFFECT.
• This makes Two-Way ANOVA very efficient, but also more complicated to interpret.
Three questions are answered by a Two-way ANOVA with Interactions:
• Is there any effect of Factor A on the outcome?
• (Main Effect of A, or Car Brand on gas mileage)
• Is there any effect of Factor B on the outcome?
• (Main Effect of B, or Tire Type on gas mileage)
36
• Is there any effect of the interaction of Factor A and Factor B on the outcome?
• (Interaction Effect of Car x Tire TOGETHER on gas mileage)
• This means that we will have three sets of hypotheses, one set for each question.
Effect size and power analysis:
Effect size:
- Effect size is a quantitative measure of the magnitude of the effect.
- The larger the effect size, the stronger the relationship or greater the difference
between variables.
Statistical significance VS. meaningful significance:
- Sometimes one can have statistical significance in the findings that may not be as
meaningful in the real world.
- How large of a difference is important in theory or practice?
- Be wary of reports that state “we see a trend towards significance,” despite the findings
showing no significance.
Estimating effect size:
- Effect size (ES) represents the standardized difference between two means.
o ES = (M1 – M2)/ s
- Where M1 – M2 = differences between the mean scores
- Wheres = standard deviation (pooled)
o *Pooled = standard deviations are averaged together
- Like z-scores, the ES allows for comparison between studies using different dependent
variables because it puts data in standard deviation units.
o What do the effect size values mean?
o An effect size of:
0 = no difference
0.2 = small
0.5 = medium
0.8 = large
37
Power analysis:
• Statistical power refers to the probability of a hypothesis test of finding an effect if there
is an effect to be found.
o The higher the statistical power, the higher the probability of detecting an effect when
there is one.
• A power analysis can be used to estimate the minimum sample size needed, given the
desired significance level, effect size, and statistical power.
38
t-test example:
Let’s say that you are wanting to compare two independent group means and you…
• Set your significance level at 0.05
• Want at least a medium effect size (i.e., d = 0.50)
• Want minimum statistical power of 0.80
• The sample results may be due to chance and the relationship in the population is not
strong or is zero.
o The null hypothesis for correlations is that r = 0
• When we interpret a correlation statistic, we first look at the p-value to see whether or
not it is statistically significant EVEN IF THE r-value IS LARGE.
• If the correlation is not statistically significant, we state there is no relationship or
association between the variables.
42
• Example: let’s say we found that intelligence scores explained 50% of income… what
other factor(s) might account for the ‘left over’ 50% change in income?
2. Several factors at once may influence the outcome of a variable of interest.
• Example: intelligence might be related to income, but income is likely. Related to
other things (age, gender, geographic location, years of experience, etc.)
• This is where regression comes in…
• Regression involves the development of explanatory or predictive models wherein
one (or more) variable(s) can explain changes in another variable of interest.
• There are different forms of regression: multiple, hierarchical, logistics, ordinal, etc.
We’re going to start by learning about simple linear regression and the build to multiple
regression.
USES OF REGRESSION
• Amount of change in a dependent (Y) variable that results from changes in the
independent variable(s) (X).
• Regression attempts to determine or explain the causes of phenomena.
• Prediction and forecasting (medical and health outcomes, etc.)
• Support or negate theoretical models.
• Modify and improve theoretical models and explanations of phenomena.
REGRESSION MODEL:
• Relation between variables where changes in some variables may “explain” or possibly
“cause” changes in other variables.
• Explanatory variables are termed the independent variables and the variables to be
explained are termed the dependent variable:
o Dependent variable (Y) – ‘depends’ on the independent variable (the ‘X’ variable).
o The Independent variable (X) – influences the dependent variable (Y); controlled by
the researcher.
- Regression model estimates the nature of the relationships between the independent
and dependent variables
XY
o Change in dependent variables that results inform changes in independent
variables (size of the relationship)
44
EXAMPLES:
Example 1:
Simple Linear Regression (one independent variable, one dependent variable)
- Let’s say I want to know what the price of gasoline is going to be when the price of
crude oil goes up…
o Dependent variable: retail price of gasoline in Vancouver
o Independent variable: the price of crude oil
If crude oil prices (independent) go up (a lot or a little), what does that do
to gas prices (dependent variable)?
Example 2:
Multiple Regression (multiple independent variables, one dependent variable)
- Let’s say I want to know the employment income given certain factors (e.g., hours of
work, education, occupation… etc.)
o Dependent variable: employment income
o Independent variables: hours of work, education, occupation, sex, age, region,
years of experience, unionization status, etc.
Bivariate and multivariate models
LINEAR REGRESSION:
45
o To estimate y
EXAMPLE:
46
47
variable (x).
• “It is most important to know whether the value we have for the slope is statistically
significant; if significant, then the value of the slope in the whole population
(remember we are testing a sample here) can be considered different from zero.”
• In other words, we want to know: is the slope that we have (of our model, β1)
different enough from zero to the degree that we are confident that the
explained/predicted relationship between x and y is not due to chance?
When there is a single independent variable (as is the case in simple linear regression), the
standardized beta (β) coefficient is equal to the Pearson correlation coefficient.
Y = βx + a
Y=mx +b
Where β = Pearson correlation coefficient
HYPOTHESIS TESTING
• Error of Estimate: how much each data point differs from the predicted data point
(regression line).
• Standard error of estimate (SEE): the measure of how much each data point (on
average) differs from the predicted data point (the regression line), or a standard
49
*The sun of squared differences can never be in the negatives. (And hypothetically could be
zero, but there is always some sort of error in the real world*
The Influence of Outliers:
- Outliers make the slope become bigger – the line gets pulled towards the outlier
- The r value becomes smaller (less linear) – it changes the whole equation
Example: Aerobic Capacity and Running Performance
- How fast (time) a person can run a 10km race is dependent in part on their aerobic
capacity (VO2 max).
o Can we predict 10 km running times using VO2 max values?
o Variables: Time (Y) and VO2 (x).
How long will it take a person to run 10 km if their VO2 max is 60 ml/kg/min?
Goal: to predict the length of time it takes to run 10 km for a given VO2
max.
50
Y = -31.09 + 92.464
Y = 61.8 min
- Based on the model we can estimate that a person with a VO2 max of 42 ml/kg/min will
be able to run a 10 km race in 61.8 minutes.
SUMMARY:
• Regression models allow for the explanation and prediction of outcome variables
(dependent variables of interest) using either one, or more than one independent
variable.
• Regression models can be qualitatively described as:
DATA = MODEL + ERROR
• The regression line superimposed onto a scatterplot of two variables (simple linear
regression) IS the MODEL (Y= βx +a).
• The slope of this regression line in SIMPLE LINEAR REGRESSION is the Pearson
correlation coefficient – this is where we see connections between Chapter 15
Correlation and Chapter 16 Regression.
• SPSS and other statistical programs will select the ‘best fitting model’ based on the
smallest amount of total variance that exists between individual data points and the
regression line itself.
• This best fitting model is based on OLS (ordinary least squares) mathematics.
• In general, regression models are statistically significant if the total OLS variance with
the fitted model is less than the total variance from the null hypothesis model.
REGRESSION…CONTINUNED
MULTIPLE REGRESSION:
- What do we do when we have more then one predictor/independent variable?
o E.g., predicting scores on a memory task from measures of short-term memory,
ready comprehension, and processing speed.
o We use MULTIPLE REGRESSION ANALYSIS
Y = bX1 + bX2 + a
o Where:
X1 is the value of the first independent variable
X2 is the value of the second independent variable
b is the regression weight for that particular variable
52
o Press the ‘Statistics’ button, select ‘Estimates,’ ‘Confidence Intervals,’ ‘Model fit,’ ‘R-
Squared change,’ and ‘Descriptives,’ then click ‘Continue’
o Click ‘OK’
o A model with a small R2 and a larger SEE is indicative of a model that doesn’t fit the
data as well.
3. Take a look at the ‘Coefficients’ table:
• Provides information about the contribution of each independent variable to the model;
in other words, provides information about how strongly each independent variable
influenced the dependent variable.
• First, take a look at the ‘Sig.’ column:
o For each of the variables, check their significance level – this tells you which
variable(s) make a statistically significant contribution to the model (if <.05)
• Next, take a look at the ‘Beta’ column under ‘Standardized Coefficients:’
o For each of the variables, take a look at the beta value – this tells you how much of a
unique contribution each variable makes to the model.
o Larger values = greater unique contribution to the model.
o +/- indicates the direction of the relationship between the independent variable and
dependent variable (i.e., positive beta value = direct relationship).
B - These are the values for the regression equation for predicting the dependent variable from
the independent variable.
β are Beta-standardized coefficients. Standardizing the variables before running the regression
puts all of the variables on the same scale (kind of like what we did with z-scores comparing
math and verbal scores – we took the math and verbal scores and standardized them to z-
scores to compare).
“Findings show that hours of study and classes attended positively influenced final exam scores.
For every one point increase in hours studied, final exam scores would increase by 2.403 points.
Similarly, for every one point increase in number of classes attended, final exam scores would
improve by 2.546 points. These findings indicate that weekly studying and regular attendance
of classes helps improve final exam scores for students in PSYC 207.”
NON-PARAMETRIC TEST:
Introduction to non-parametric tests:
- The term “non-parametric” refers to the fact that these tests do not require
assumptions about population parameters
o Nor do they test hypotheses about population parameters
Previous examples of hypothesis test, such as the t-test and ANOVA, are
parametric test and they do include assumptions about parameters and
hypotheses about parameters
- Parametric tests have different assumptions of nonparametric tests:
o Variance of each group are similar
o Sample is large enough represent the population
- Nonparametric statistics don’t require the same assumptions
1. They are distribution-free
2. Allow data expressed as frequencies to be analyzed
Examples of non-parametric tests
Mann-Whitney U
56
Kruskal-Wallis
- Non-parametric ‘version’ of One-Way ANOVA
- Use when you have different levels/groups (looking for differences between more then
2 groups)
- Determine if there are statistically significant differences between two or more groups
of an independent variable on a continuous or ordinal dependent variable.
- It is considered the nonparametric alternative to the one-way ANOVA, and an extension
of the Mann-Whitney U test, to allow the comparison of more than two independent
groups.
- Does exam performance (continuous scale from 0-100) differ based on test anxiety
levels?
o Dependent variable: Exam score
o Independent variables: Anxiety levels (low, medium, high)
- Do salaries (ranked, ordinal) differ based on an independent variable (e.g., job type) ?
o Dependent variable: <$20,000, $20,000-49,999, >$50,000
o Independent variable: salaried, hourly, contract
Chi-Square test
- Sometimes researchers are interested in evaluating whether a number of cases in
specific categories are different based on what would be expected by some basis of
chance or some other form of known information (e.g., census).
- Chi-square provides a statistical test of the significance of the discrepancy between the
observed and expected results.
- The difference between the Chi-square tests and the other hypothesis tests we have
considered (t and ANOVA) is the nature of the data.
o For t-tests and ANOVA, data are continuous (i.e., interval/ratio variables).
57
o For chi-square tests, data are frequencies (i.e., how frequent does a score
occur?).
- Chi-square allows you to determine if what you observe in a distribution of frequencies
is what you would expect to occur by chance.
- One-sample chi-square is also called the “Goodness of Fit test.”
o Note: how this is in line with what we’ve been learning throughout this course
(e.g., is the result due to chance, or due to a particular intervention?)
- The chi-square test for goodness-of-fit uses frequency data from a sample to test
hypotheses about the shape or proportions of a population.
- Each individual in the sample is classified into one category on the scale of
measurement.
- The data, called observed frequencies, simply count how many individuals from the
sample are in each category.
- The null hypothesis specifies the proportion of the population that should be in each
category.
- The proportions (%) from the null hypothesis are used to compute expected
frequencies that describe how the sample would appear if it were in perfect agreement
with the null hypothesis.
- Null hypothesis:
o H 0 : P1 = P2 = P3
- Research hypothesis:
o H 1 : P1 P 2 P 3
- *Where P = proportion (%)
COMPUTING CHI-SQUARE:
Χ2 = Chi-square statistic
∑ = sum of
O = observed frequencies (our actual data)
E = expected frequencies – which should be
there if the null hypothesis holds true…
CHI-SQUARE EXAMPLE:
Before we work through examples of Chi-square tests, we have to address the question
of calculating the ‘expected’ frequencies.
This is actually part of the research hypothesis:
We will see how the calculation of expected frequencies plays out for one-sample and
two-sample chi-square tests.
One-Sample Chi-Square Example
From Salkind (p. 303)
Research question: Are respondents (n=90) equally distributed for their preference for a
food voucher?
For voucher (n=23)
Research question: are respondents (n=90) equally distributed for their preference for a food
voucher?
For (n=23)
Maybe (n=17)
Against (n=50)
If they were equally distributed, then, by chance we would expect something else than
the proportions presented in the data.
For one-sample Chi-square tests, we calculate the expected frequencies by dividing the
total (90 responses) by the number of groups (3). In this case the expected frequency
for each group would be 30.
How to interpret:
X2(2) = 20.6, p < .05
59