You are on page 1of 59

1

Psych notes: Data analysis


Statistics in CONTEXT
Statistics and research (tea for two and two for tea…)
- Many fields address human behavior at some level
- BEHAVIOR IS COMPLEX
- We try to:
o Understand behavior
o Predict behavior
o Encourage/facilitate behavior change

1. Research question(s) 6. Finding applied to, or informing


2. Methods practice
3. Statistics o Theoretical models help to
4. Results from research construct research questions
5. Confidence in the resultant findings/
conclusions
Statistics:
- Do not ‘prove’ things
- Statistics work best when they are used to disprove things
- If we can disprove something, then whatever is leftover – this is probably the true state
of things
Measurements and Statistics (garbage in, garbage out)
Types of statistics
Sampling and statistics
- Bias can occur
Central tendency scores – a single score that represents all the scores
- How can we calculate one score that best represents all the scores combined? What are
the options?
o Mean: average
 X (with a bar) is the mean
 Sigma (sum of)
 X (with no bar) is the individual score
 n is the sample size
2

o Median: midpoint
 Order them from lowest to highest and find the midpoint
o Mode: most frequent

SUPER IMPORTANT TOPIC***************


- There are pillars (measures of central tendency, and variability)
o Understanding these pillars gives us a greater insight into how statistics operate

Variance:
- (Root word is varied)
- Reflects how scores differ or vary from the mean
- Also called “spread” or “dispersion”
- We may know the average or mean score of a series of values. But what of the spread or
dispersion, or variability of those scores?
Measures of Variance:
1. Range
a. Formula for Range:
Range = highest score – the lowest score
2. Variance
a. Is an indication of how far individual scores are from the mean ON AVERAGE.
b. Or, another way of describing variance is the average distance a set of the score
is from the mean

S2 = Variance
Sigma = sum of
X = individual score
X bar = mean
N = sample size
3. Standard Deviation (SD)
3

a. We square the differences to get rid of the negative values that arise from
subtraction
b. We then take the square root of the entire values and return them to the
original units we started with

Sigma = sum of
N = sample size
SD = Standard Deviation
Ri = The return observed in one period
R avg = the mean average

Visualizing Data:

A visual representation is an effective way to examine the characteristics of a dataset. Is there a


way to organize the data so we can make some sense of it? (Data reduction). YES, WE CAN. We
can create a frequency distribution using class intervals.

1. Frequency distribution
 Is a method of tallying and representing the number of times a certain score occurs
 Usually group scores into interval classes/rages
2. Class intervals
 This is a range of numbers. 
 E.g age range from 10-19; 20-29; 30-39, etc.
 So the ages of 14, 18, 19 would go into the first interval range of 10-19 years. The
total number (frequency) of ages in that interval would be 3.
 This is the first step in the creation of frequency distribution.
3. Histograms
 You can create a histogram for income, and anything
 Uses the curves (the bell curve etc.)
4. Distribution patterns
o Distribution can be different in three ways
o Variability
o Skewness
4

 If the “tail” is towards the low end, it’s negatively skewed.


 If the “tail” is towards the high end, it’s positively skewed
 If it is a bell curve, there is no “tail” end, so there is no skewness
o Kurtosis

Frequency distribution:
Step one: Determine the range
Step 2: Decide on the number of class intervals (usually 10-20)
Step 3: Decide on the size of class intervals (related to the # of class intervals)
Step 4: decide the starting point for the first class
Step 5: create the class intervals
Step 6: put the data into the class intervals

Reliability:
Concepts and Measures
- Social science research is often concerned with important concepts, such as “freedom”
and “equality”; but these concepts can be vague before we define them clearly.
- There is a need to “conceptualize” and “operationalize” a concept to measure it and use
the measured data (variable) in empirical studies
- Concept –> Conceptual Definition –> Operational Definition –> Variable
1. Clarify the empirical properties and form an unambiguous definition
2. Devise a strategy for measurements (this has implications for the reliability and validity
of the measure itself…)
3. Record the measurement
4. Makes sense of and infers from the measures (statistics)
Measurement in the context of statistics
- Confidence in findings (we desire this!)
- The tools we use to measure
o Surveys
o Interviews
o Psychometric tests
o Physiological assessments
o And even the people who acquire the data
- It goes: Research Question(s) -> Methods and Measures -> statistics -> Results ->
Confidence in the resultant findings and conclusions
- NOTE: measures are only part of the methods
1. The measures we have to sue are:
a. Retain some consistency in measuring what it is supposed to measure
(Reliability)
b. Actually, measure what they are supposed to measure (Validity)
Reliability – some theory
- Reliability of a test refers to the stability, precision, consistency, or repeatability of
measurement
5

- Definition:
o Consistency of the measure
- Observed score – an obtained score that comprises a person’s true score and an error
score.
- True score – the part of the observed score that represents the individual’s real score
and does not contain measurement error.
- Error score – the part of the observed score that is attributed to measurement error.
Measurement error
- Systematic errors – predictable errors of measurement
o Occur in one direction – under or over-estimating
o Can correct it by re-calibration the instrument or adding/subtracting the
constant
- Random Errors – due to chance and unpredictably affect a subject’s score from trial to
trial
1. The participant – many factors: mood, motivation, fatigue, health, fluctuations in
memory and performance, previous practice, specific knowledge, familiarity with
test items
2. The testing – lack of clarity or completeness in directions, how rigidly the directions
are followed, motivation applied, supplementary directions, etc.
3. The scoring – competence, experience, the dedication of the scorers and nature of
scoring itself, the familiarity that the scorer has with the behavior being tested
4. Instrumentation – inaccuracy, lack of calibration
Estimates of reliability measures
- We can quantify reliability by using a type of coefficient that measures the degree of
association or strength of the relationship between two sets of data
Types of reliability
1. Interrater reliability (also called the “test-re-test”)
2. Interrater reliability
3. Parallel (alternate) form
4. Internal consistency
Validity:
- Validity is a trait (characteristic or property) of the test or instrument
- Types of instruments
o Lab. instruments (HR monitors, VO2 max)
o Survey (questionnaires)
o Psychological instruments (motivations) – scales
o The research itself (qualitative research)
- Similar to the reliability, r values that are 0.70 or higher are considered thresholds for
acceptable validity
- Definition:
o The measure is measuring what it is intended to
Common types of validity
1. Content validity A ty
6

a. When you want to know whether a sample of items truly reflects an entire
universe of items on a certain topic.
b. Degree to which the content or subject matter of a test represents the
content or subject matter of the course to which it applies.
c. Hot to establish …
i. Content expert
ii. Do items represent all possible items?
iii. How well does the number of items reflect what was taught?
2. Criterion validity
a. Criterion: measures of criterion-based validity are established when results
of one test are compared to results obtained with an accepted standard or
CRITERION
b. Predictive criterion validity
i. How consistent a test outcome is with a criterion that occurs in the
future
ii. The use of a measurement instrument for prediction or estimation of
some future event using the present test.
iii. In this situation, the prediction is the criterion, and the present test is
the one to be validated
c. Concurrent criterion validity
i. How well a test outcome is consistent with an established or other
criteria (present, not future).
ii. Two measures of the same variable are obtained by two different
instruments in a close period
iii. One measure is a criterion measure, and the other might be a new
test or instrument to be validated
3. Construct validity A y
a. Construct validation is used when the variable of interest has no definitive
criterion, is too difficult to measure, or cannot be directly observed;
constructed to measure an abstract trait.
b. Constructs are often multidimensional (e.g., health, personality, intelligence,
happiness, depression, sportsmanship).
Summary:
- Measures that are valid increase confidence in statistical conclusions
- Validity means that the instrument measures what it says it measures (range: 0.00-1.00)
- Three aspects of validity that concern the researcher are:
o Content
o Criterion (concurrent and predictive)
o Construct (hardest one to ascertain)
- Measures should be valid and reliable – this is the definition of ‘accuracy’

Data analysis: probability primer


1. Probability and why it matters
7

a. Basis for determining the degree of confidence that an outcome is a case


b. The normal curve provides us with a basis for understanding the probability
associated with any possible outcome
c. Complex but the essential concept for understanding inferential statistics…
i. (Chance, likelihood, probably, etc.)
2. Probability and the basic concepts
a. “Probability: the likelihood that anyone event will occur, given all the possible
outcomes; it gives us some indication of an outcome or future event”
b. No guarantees but rather the likelihood of an event that happens at random
3. Measuring probably
a. Probability can be quantified as a number between 0 and 1, were
i. A number close to 0 means “not likely” (0% chance
ii. A number close to 1 means “quite likely” (100% chance)
iii. The range is P< 0.0-1.0
4. Probability outcomes
a. Discrete and continuous variables
b. We call this “boundedness” the sample space of a random variable
c. The sample space can be defined as the set of all possible outcomes for an
experiment (or sample)
d. Within the sample space lie the numerical characteristic or description of each
event called the “random variable”
Very simple probability involves the chances that a particular event will happen.
P = probability
E = event
P(E) = the probability of that event occurring

We express probability in terms of fractions:


P(E) = number of favorable outcomes
total number of possible outcomes
Example coin toss: If a fair coin is tossed into the air, what is the probability that it will
land heads up?
P(E) = heads (1) = 1
heads (1) + tails (1) 2

½ = 50% so there is a 50% chance that the coin will land heads up

- Events can happenchance


8

- Discrete random variable: a countable set of distinct possible values (e.g., dice, city of
birth, etc.).
- Continuous probability: any value (to any number of decimal places) within some
interval is a possible value (e.g., height, weight, age, etc.).
- Sampling continuous variables results in a ‘spread of scores’ that resembles a curve,
instead of clear, discrete scores.

Z – Scores, also known as standardized scores


- If we measure the length of an object in meters, the measurement obtained is
meaningful.
o Example:
 If I wanted to measure something more complex like intelligence, or even
physical fitness. Physical fitness, for example, is comprised of more than
one ‘domain’
 These domains include things like:
o Aerobic capacity
o Power
o Flexibility
o Body composition
9

o Taken TOGETHER, these domains can give us a sense of a person’s physical


fitness, but they all have DIFFERENT UNITS of measurement (comparing apples
to oranges)
- The z-score or standardized score makes it possible, under some circumstances, to
compare scores that have different units of measurement.
o Meters and pounds
- We convert these different measurements into the SAME measurement variable: THE Z-
SCORE.
- Z-scores are useful statistics because they:
1. Enable us to compare two scores that are different normal distributions.
2. Allow us to calculate the probability of a score occurring within a normal
distribution.
- Z-scores extend what we already know of the normal curve.
- With z-scores the mean is represented by a zero (0).
- A z-score represents the number of standard deviations (SDs) the raw score is from the
o A z-score lets us know where in the normal curve the score falls in SD units.
- With z-scores, the normal curve stays the same – all proportions, everything. The only
things that change are that now the SDs are z-scores and the mean has been
standardized from its raw score to zero.

- In other words, if a score on the normal curve lies 1.5 SD above the mean, the z-score
would be 1.5.
- If a score on the normal curve lies -2.0 SDs below the mean, the z-score would be -2.0.
- Note that z-scores can be positive or negative depending on whether or not they are
above the mean (pos +) or below the mean (neg -).
10

How to calculate Z – scores


- We calculate a standard score by dividing the amount that a raw score differs from the
mean of the distribution by the standard deviation.
- The correct definition is as follows: The z-score is a standard score that is calculated as
the number of standard deviations the raw score is from the mean.

EXAMPLE:
Calculate the z-score if you had a raw score of 100, a mean of 120, and an SD of 10.

ANSWER

z = (100-120)

10

z = -20

10
11

z = -2

Using the Z-Table.pdf – THIS IS AN IMPORTANT DOCUMENT TO UNDERSTAND THE Z – TABLE

Statistical significance:
- Researchers assess more than one variable
- Researchers assess either difference between variables or associations between
variables.
- Researchers are interested in whether or not these differences or relationships are
STATISTICALLY SIGNIFICANT.
o What then, does statistically significant mean?
- Statistical significance: Any difference between groups that is not due to chance but
rather due to a systematic influence.
o In other words, we consider a difference (or association) between two variables
to be statistically significant if the difference is due to some systematic influence
(e.g., intervention).
o The degree of risk you are willing to take that you will reject that things
happened by chance - when they did.
- Researchers set parameters to reduce the chance of making a mistake BEFORE
conducting the research (a priori).
 *This is standard or level of risk is called the level of significance ( or
alpha).
o A level of significance can be viewed as the probability of rejecting that chance
was the cause (re: null hypothesis) of the result when it was.
- Conventional levels of significance are set between .01 and .05.
o If we set the level of significance at .05, this means:
 There is a 5% probability that the difference/association in measured
values between the two populations studied is due to chance and not
due to intervention or treatment.
- Journal articles express it this way: “the findings were statistically significant at the 0.05
or 5% level.”
o Statistical significance is defined by a probability or p-value.
 This probability is denoted as p, expressed as p <0.05, p<0.01, or p<0.001
in journal articles.
- Where does this calculated p-value come from?
o SPSS and other statistical computer packages complete this calculation for the
researcher.
o E.g., A t-test, which measures differences between two means may present a
calculated p-value of 0.009.
 If we had set our level of significance at 0.05, we see that 0.009 < 0.05.
What this means is that the probability that the difference between these
12

two means happened by chance is 0.9%, much less than our threshold of
5%.
Statistical significance and the null hypothesis:
- When the calculated p-value (e.g., p=0.024) is less than our set threshold for the level of
significance (e.g., 0.05), we say that there is a statistically significant difference
between the two means.
o If there is a statistically significant difference, we are also saying that we reject
the Null Hypothesis.
 In so doing, we are accepting some other explanation other than chance
that explains the differences we see.
- Conversely when the calculated p-value (e.g., p=0.32) is greater than our set threshold
for the level of significance (e.g., 0.05), we say that there is no statistically significant
difference between the two means.
- If there is no statistically significant difference, we are also saying that it is likely that the
differences observed happened by chance, and that the Null Hypothesis prevails.
o In so doing, we are accepting that chance explains the differences we see.
1. Our calculated probability level is <0.05
a. REJECT the null hypothesis – believe that there are important differences
between the two populations. “The findings are statistically significant.”
2. Our calculated probability level is >0.05
a. FAIL TO REJECT the null hypothesis – still don’t know whether there are
important differences between the two populations or not. Our probability level
is >0.05. “There is no statistical significance with the findings.”

Example of the tale of two patient groups:

In A, we see that the mean of the second patient group (with disease) falls on the left
side of the 5% level of significance we have established. What this means is that there is a
13

greater than 5% chance that the mean of the disease patients happened by chance and not due
to the disease. This is not an acceptable probability for us and so we would fail to reject the null
hypothesis and state that there is no statistical difference between the means of the patients
with and without the disease.
In B however, what we see is the mean from the patients without the disease group to the right
of the 5% level of significance we have established previously. This means that there is less than
a 5% chance that the mean of this variable occurred by chance and the difference between the
means is most likely due to the disease process. In this case, we would reject the null
hypothesis and state that there is a significant difference between the means.

Type 1 and Type 2 Errors:


- Isn’t there a chance that we are wrong?
o For example, could we reject the null hypothesis when it is true?
 Yes
- Can we also fail to reject the null hypothesis when we should have rejected it?
o Also…yes
1. Type I Error: The rejection of a null hypothesis that is TRUE
- You have concluded based on the findings that the intervention was responsible for the
difference in the scores observed.
- REALITY: Unknown to you, the intervention had nothing to do with the differences in
scores.
2. Type II Error: Failure to reject a FALSE null hypothesis
- You have concluded based on the findings that there is no difference in the scores
despite intervention and have ‘failed to reject the null hypothesis.
- REALITY – the difference in scores was due to the intervention.

(this is the real situation, which is unknown to the investigator)


- We never know for sure if we are committing a type 1 or 2 error
o But we can take steps to decrease the probability of committing one or the
other, or both
14

We can take steps to decrease the probability of committing one or the other, or both:
1. Robust sampling (including sample size considerations).
2. Methods used in the study (internal validity) – e.g., validity and reliability of the
instruments, personnel.
3. Levels of probability selected a priori (or before the experiment is run).
Summary:
- Statistics work to disprove the null hypothesis; this leaves the research hypothesis as a
reasonable alternative.
- We have to set a level of risk to reject the null hypothesis (conventionally this is 5%, or
0.05).
- We see this expressed in journals as (p<0.05).
- p<0.05 means that there is less than a 5% chance that the results we see happened by
chance; most likely they happened because of something else.
- When we see this, we state that there is a ‘statistically significant’ difference or
association between the variables being investigated.
- A Type 1 error is when the researcher rejects the null hypothesis when it was true.
o In other words, the researcher concluded that there was a difference/association
between the two means, but in reality, there was not.
- A Type 2 error is when the researcher fails to reject (or accept) the null hypothesis when
it was false.
o In other words, the researcher concluded that there was no
difference/association between the two means, but in reality, there was.

Standard error of the mean:


- When we draw a sample from a population calculate its mean, how close are we to
knowing the actual population means?
o We know that in part, the ‘closeness’ to the population mean is dependent on
the sample size.

(The comparison between the population normal curve and a sample normal curve)
15

How reliable is a sample mean?


- There are two additional things we can assess that will give us a clearer perspective on
how close the sample mean is to the population mean:
o The standard error of measurement (SE)
o Confidence intervals (sometimes called confidence limits
Standard error of the mean:
- If the standard deviation (SD) describes the individual variance about a mean or average
score from a group of individuals, then the Standard Error of the Mean (SE) estimates
population variance.

(the normal curve is visible in terms of its proportions, but the sample mean has been replaced
by the population mean, and the standard deviations (SD) have been replaced by the standard
error [SE])
- The mean is referred to as the population mean
16

Why SE is important:
- We don’t want our sample mean to fall too far from the population mean.
- The Standard Error (SE) lets us know how our sample means are likely to deviate from
the population mean.
o Therefore, the smaller the SE, the closer the sample mean is to the population
mean.
- The formula for the standard error of the mean is:
SD
SE=
√n
SD = standard deviation
Denominator = square root of the sample size
- If the SD increases, there will be an increase in the SE.
- If the SD decreases, this will decrease the SE.
Example:
Given a sample of 100 (n=100) and an SD of 20
SE of the mean:
20/√100 = 20/10 = 2

Say we had another sample from the same population but with an n=100 and SD of 40.
SE of the mean:
40/√100 = 40/10 = 4

- If there is more variability in the sample, the SE increases. This makes it harder to draw a
sample that is representative of the population. Given wide variability, we will need a
larger sample size.
- If the sample size is small, this will cause an increase in the SE.
17

- Conversely if the sample size is very large, then the SE will decrease.

- With a larger sample size, we have less error. If there is less error, we can estimate more
precisely the parameters of the population.
- Amount of variance in population has implications for sample size… (i.e., a small amount
of variance in population, can have a smaller sample and still have small SE)
What does this tell us about the relationship between sample size, sample SD, and the SE?
- If we want our sample to represent the population, researchers have to consider the
sample size but also the SD.
o Generally, if the SD of a sample is large, the researcher may have to compensate
with a larger sample size to reduce the SE.
o This has implications down the road for statistical significance.

PART TWO: CONFIDENCE INTERVALS


(Reminder)
Recall that the population parameters (i.e., population mean (μ) and population standard
deviation (σX)) will not be known in almost all instances.
So we have a bit of a dilemma:
• We use samples to infer populations.
• But the population parameters (mean and standard deviation) are not known…
• So we’re using an estimate (sample statistic) to estimate the population parameters
(which is unknown).
• AND! We want our sample statistics (mean ± SD) to be similar to the population
parameters!
CONFIDENCE INTERVALS (CI)
- It is unreasonable to expect that the mean obtained from a single sample will be
identical to the (unknown) population mean.
- If we are estimating a population mean, we need to state (along with our sample mean)
some measure of our confidence.
o But to understand confidence intervals we have to think of things a bit ‘in
reverse’:
18

 How confident can we be that the population mean falls close to our
sample mean?
- An estimate of a population parameter given as a single number is called a point
estimate.
o The sample mean is a point estimate.
- A confidence interval (CI) is a range or interval of values that surround the point
estimate.
o Point estimates (sample means) and CI estimates are types of statistical
estimates that allow researchers to infer the true value of an unknown
population parameter using information from a random sample from that
population.
 In other words… a confidence interval includes upper and lower limits,
calculated using the sample mean (point estimate), wherein, after
repeated sampling, 95% or 99% of the time, the population mean would
fall into this interval.

Calculating a confidence interval is fairly straight forward:


• 95% CI = ± 1.96 SEs
• 99% CI = ± 2.58 SEs

EXAMPLE:
You draw a sample (n=50) from a population. The mean is 40 and the SD = 15.
1. Calculate the 95% CI for your sample.
2. Explain what you have calculated.

Step 1: Calculate SE of the mean

SE = SD/√n = 15/√50 = 15/7.07 = 2.12

Step 2: Plug into CI equation

CI = mean ± 1.96 (SE)

95% CI = 40 ± (1.96)(2.12)

= 40 ± 4.15

= 35.85 – 44.15

Therefore the 95% CI in this case would be 35.85 – 44.15


19

EXAMPLED EXPLAINED:
- You draw a sample (n=50) from a population. The mean is 40 and the SD = 15. Calculate
the CI for your sample. Explain what you have calculated.
o 95% CI = 35.85 – 44.15
 Can also be written as:
 95% CI = 35.85 ≤ μ ≤ 44.15
Explanation: We have constructed a confidence interval wherein we are confident that the
true population mean will fall within this interval 95/100 times of repeated sampling.
- Remember: μ = population mean
o The tighter the range, the greater confidence we can have that the population
mean is close to the sample mean
95% CI = 35.85 – 44.15
Another way of interpreting this is the following (long version):
• One sample (n=50) we pull from the population will have a mean of 40 with 95%
confidence limits between 35.85 and 44.15 (holding constant sampling error…)
• Another sample (n=50) we draw could have a mean of 42 with 95% CI between 37.85
and 46.15.
• The 95% CI indicates that if we were to draw 100 random samples, each with n=50, we
could construct 100 CIs around the sample means, 95 of which could be expected to
contain the true population mean.
SUMMARY:
• We want our sample to reflect the population as closely as possible.
• The standard error of the mean (SE) lets us know how close our sample mean is to the
population mean: the smaller the SE, the more closely our sample mean is to the
population mean.
• Both the sample size and the standard deviation of that sample will influence the SE.
• We can also examine confidence intervals to give us some confidence about our sample.
• Confidence intervals are based on the SE and when calculated, will tell us how likely the
population mean will fall within a certain range (either 95 or 99% of the time).
• Taken together, both the SE and confidence intervals should be considered to give us
‘added confidence’ that a sample mean reflects the population.
RUNNING T-TESTS THEORY
https://www.investopedia.com/terms/t/t-test.asp
20

1. T-tests
a. EXAMPLES
i. Psychology – some form of cognitive-behavioral therapy on levels of
anxiety (test differences in anxiety levels).
ii. Sociology – household income on food security (test differences in food
security).
iii. Human kinetics – creatine phosphate supplementation on maximum
bench press strength (test differences in strength).
b. There are two groups: (their questions)
i. Are the groups different?
ii. How meaningful are the differences that may occur?
c. Independent T-Test
i. Assesses differences between two independent groups.
d. Dependent T-Test
i. Assesses difference between two dependent groups (for example; a test-
retest situation with one group being tested twice)
Example: wants to measure anxiety levels, and wants to start an anxiety help group
Group one: first-year students
Group two: fourth-year students
Independent t-test:
- Testing group one and group two with the same test
Dependent t-test:
- One group, went to an anxiety group, and then was re-tested.
OVERVIEW:
• We can’t use z-scores to assess differences between two sample means because the z-
distribution (or normal curve) assumes we know the population distribution (mean and
variance).
• In the real world this rarely is known and we don’t have the resources to assess this, we
have to rely on samples to provide estimates.
• The researcher needs to estimate the sampling distribution and the associated standard
error (i.e., the standard deviation of the sampling distribution) for a given sample size.
FROM Z TO T LOGIC:

- Because our estimate of  (population variance) is based on our sample…


- And from sample to sample, our estimate will change or vary…
21

- There is variation in our estimate, and more variation in the t-distribution.


- Because of the greater variance that arises using sampling, we have to use at-
distribution, rather than the z-distribution.
2. T DistributioA A n:
a. When the sample size is infinite (or IS the population) then the t-distribution of
scores is identical to the z-distribution.
b. But the sample size affects the t-distribution; the smaller the sample size, the
more variance is introduced to the t-distribution (it becomes wider or ‘fatter’).
c. The t-distribution is also associated with the degrees of freedom, which can be
calculated from the sample size (the larger the sample size, the larger the
degrees of freedom).
d. Different from the z-distribution, the t-distribution of scores uses the standard
error (SE), not the SD.
 Here we see how sample size affects
the variability of the sample means
about the population means.
 Although the means of the different
samples are similar, notice that as
the sample size increases, the
variability of the means decreases.

- It looks much like the normal curve but is ‘fatter.’


e. Just how much the t-distribution differs from the normal curve depends on the
degrees of freedom (based on sample size).
- When the sample size is small (small degrees of freedom), the t-distribution is much
wider.
- When the sample size approaches 30, the t-distribution begins to approximate the
normal curve (z-distribution) more closely.
- It takes a slightly more extreme sample mean for statistical significance when using a t-
distribution compared to using a normal curve (z-distribution).
f. KEY: Using ‘fatter’ t-distributions means that the means have to be farther
apart for there to be statistical significance.
i. This gives the researcher ADDED CONFIDENCE that a significant
difference exists for a specific level of significance (e.g., 0.05 or 0.01).
22

significance in the other direction…


this means that .025 is in each tail of
the distribution of your test (5% in
both tails combined).
3. If your estimate is based on a much
smaller sample of n=7 (df =6), the
cutoff is 2.45 SE. This means that the
difference between the means
1. For example, using the normal would have to be much larger for
curve, 1.96 is the cut-off for a two- statistical significance.
tailed test at the .05 level of
significance when the sample size is 4. If your estimate is based on an even
the population. smaller sample size of n=4 (3
2. For the normal curve: a two-tailed t- degrees of freedom), the cutoff
test allots half of your alpha (p- increases to 3.18 SE. This means that
value) to testing the statistical the difference between the means
significance in one direction and half would have to be even larger for
of your alpha to testing statistical statistical significance.

SUMMARY OF EXPLANATION:
- The larger our sample, the more faith we have that its mean represents the population
mean and the smaller the correction required.
- These corrections have been calculated:
o Student's t-distribution (Salkind Table B.2 pp. 357-358)
- After running a t-test, we can then use the p-value and compare this to values in the
Student’s t-distribution table to determine whether the results are statistically
significant.
o To do so, we will need to determine:
 Degrees of freedom = (n1 – n2) - 2
 p-value (<.05, <.01, <.001)
RUNNING T-TEST IN SPSS & DIRECTIONAL AND NON-DIRECTIONAL T-TEST:
T-test to assess significant differences:
23

- TCRIT = Critical value of t (the minimal


threshold for statistically significant
difference)
- TCALC = value of t (where the second
mean lines, in SE units)

• A calculated t-ratio (TCALC) is compared to a critical value of t (TCRIT).


• These critical values of t have been published as t-tables. (pp. 357-358 in Salkind)
• t-tables give us the fraction of sample means that lie more than a specified number of
SEs away from the mean.
• *Instead of critical z-scores of SDs from the mean that would tell us significance
or not, now we are using SEs which better reflect the population the sample was
drawn from…
• This specified number of SEs is the critical value of t (T CRIT).
• If the calculated t-ratio (TCALC) is larger than this specified number of SEs in the table
(TCRIT), then we state that there is a significant difference between the means.
• TCALC > TCRIT Significant difference between means
• If the calculated t-ratio (TCALC) is smaller than the critical value of t (TCRIT), then we say
that there is no significant difference between the means.
• TCRIT > TCALC No significant difference between means

Calculating degrees of freedom for independent and dependant T-test:


Independent-samples: o df = (n1 + n2 – 2)
- tests the relationship between 2
independent populations
- sometimes these samples have an
unequal number of participants
24

Dependent-samples: - often seen in pre-test/post-test


studies
- tests the relationship between 2
o df = (n-1)
linked samples (e.g., means
obtained in 2 conditions by a single *where n = the number of pairs of subjects,
group of participants) and df = degrees of freedom
Calculating T statistics:
- independent T-Test - Dependent T-test
o assess the difference o Assess differences between
between two independent two dependent groups
groups
Calculating TCALC
E.g., Independent samples t-test.
t = X1 – X2
sx1 – sx2
• The numerator represents the difference between the independent group means.
• The denominator represents the standard error of the difference between the means or
the variability within the samples:
t= differences between group mean
variability of this difference
The calculated t-ratio:

t=X –X
1 2

s –
x1

A portion of the differences observed between scores can be explained by:


(1) The experimental treatment.
(2) The rest of the variance is unexplained, due to all the other factors influencing the
response.
Note: this is why the methods in terms of controlling for other factors influencing the change in
the dependent variable are so important.
25

• If the difference between means is large and the variability is small, this ratio will be
large.
• A large value of this ratio will be a strong indication that there is a real difference
between the means.
• If the difference between means is small and the variability is large, this ratio will be
small and we must conclude that the difference between means is not significant.
• *Therefore the size of this ratio is our measure of the significance of the difference
between the means.
• This t-ratio is the TCALC
Computing the test statistics (independent samples)
• Numerator is the difference
between the means.
• Denominator is the amount of
variation within and between each
of the two groups (called ‘pooled
variance’).
Computing the test statistics (dependent samples)
• ∑d = sum of all the differences
between the scores
• ∑d2 = sum of the differences squared
between groups of scores
• n = number of pairs of observations
t (58) = -.14, p > .05
- t represents the test statistic used
- 58 is the number of degrees of freedom
- -.14 is the obtained or calculated value (from the formula)
- p > .05 indicates the probability

One-tailed and two-tailed tests: (or one/two-sided test)


• The use of t-tables (critical values of t) is dependent upon the research question.
• For example, the researcher can make a hypothesis that a certain group will out-
perform another:
26

DIRECTIONAL hypothesis: in the case of a directional hypothesis, the researcher will select a
‘one-tailed’ t-test when using t-tables.
Used when there is a director with the research hypothesis (i.e., the researcher has a
hunch which mean score will be larger).
- In a directional hypothesis the researcher hypothesizes that the predicted mean of one
group will be larger than the other. In this case, the sign must be in the predicted
direction (+) for the alternative hypothesis to be accepted.
o In other words, the t-ratio will be positive or negative depending on the direction
of the hypothesis.
o When we compare the calculated value of t to the critical value of t we DO NOT
consider the (+) or (-) sign, but only consider the absolute value of calculated t.
 Research hypothesis: Grade 7 boys’ ears (n=10) will be larger than Grade
7 girls’ ears (n=10).
 Ho = there will be no difference between the size of Grade 7 boys’
ears and Grade 7 girls’ ears.
 In a one-tailed test, the calculated ratio must also be of the
appropriate sign (+) or (-); this is to say that the predicted
difference in the direction must be supported.
 In this case, we anticipate that the calculated t-value will be
positive (+) – because we hypothesize that boys’ ears > girl’s ears.
o Critical t value for a level of significance of 0.05 for a one-
tailed test using 18 degrees of freedom:
 1 = 0.05 t(18) = 1.734 from table
o Calculated t-ratio in our study = 2.319
o 2.319 > 1.734 therefore we reject the Ho
o We can conclude statistically that Grade 7 boys’ ears are
larger than Grade 7 girls’ ears.
NON-DIRECTIONAL hypothesis: in the case of a non-directional hypothesis, the researcher will
select a ‘two-tailed’ t-test when using t-tables.
- Used when there is no direction with the research hypothesis (i.e., the researcher
doesn’t know which group will have a larger mean).
o Therefore, the sign of t doesn’t matter (can be + or -).
- Research hypothesis: there will be a difference in the size of Grade 7 boys’ ears
compared to Grade 7 girls’ ears (non-directional – we aren’t hypothesizing larger or
smaller, just different).
o Ho = there will be no difference between the size of Grade 7 boys’ ears and
Grade 7 girls’ ears.
27

- A t-ratio for a level of significance of 0.05 for a two-tailed test using 18 degrees of
freedom:
o 1 = 0.05 t(18) = 2.101 from table
- Calculated t-ratio in our study = 2.319.
- 2.319 > 2.101 therefore we reject the Ho.
- We can conclude statistically that there is a difference in size between Grade 7 boys’
and Grade 7 girls’ ears, but we don’t know which group is larger (until we look at the
means).
One-way ANOVA:

- Use when there are more than two groups


o Why can’t we just run multiple independent or dependent t-tests in these cases?
o The problem is if we do this, we increase the chance of making a Type 1 error.
 Called ‘inflation of alpha’:
- For example, the probability that NO type I errors are made in a series of 6 t-tests on the
same data set when each test is performed at the =.05 level can be computed as:
o 1-(1-.05)6 .735  .74
 if each test is independent of every other test
 Therefore the probability of making AT LEAST one type I error is:
 1 – (.74) = .26 or 26%
- That means that the probability of making a type I error in a series of 6 independent
tests is 5 times higher than the nominal alpha level of any given test (26%).
28

- SOLUTION: we use a simple One-Way Analysis of Variance (ANOVA) to assess


differences between more than 2 independent means.
ONE-WAY ANOVA:
- A nominal or ordinal variable with 3 or more response categories (the grouping
variable).
- One interval/ratio level variable (mean score of the characteristic of interest for
comparison)
Hypothesis testing:
- The statistical test question:
- You have 3 or more means, one for each of the independent (non-overlapping) groups
on a variable.
o The means will probably be different, but do they differ from each other
significantly?
o Is this difference due to chance? Or does it represent true differences in the
population?
 Ho: m1 = m2 = m3 = m4 (null hypothesis)
 Ha: m1 ≠ m2 ≠ m3 ≠ m4 (alternative/research hypothesis)

- This diagram depicts a series of


means that in all likelihood are not
different from one another because
the means are fairly close together
and the variance appears to be
equal.

- The means are further apart from


one another – some farther apart
than others.
- It could be that some means are
significantly different from one
another, while others are not.

What does ANOVA do?


29

1. ‘Partition out’ or ‘account for’ the noise (variance) between individual scores and their
group mean.
2. ‘Partition out’ or ‘account for’ the noise (variance) between the group means and what
we could call a ‘grand mean (mean of all the scores from all the participants).
3. ‘Partition out’ or ‘account for’ the noise (variance) between the individual scores and
the ‘grand mean’:
a. Whatever variance is left over, if large enough, will be the signal that says
something else other than random variance caused the differences in the means.
EXAMPLE:
• Let’s say you wanted to develop a drug that elevated the mood of stats students
(because we all know that stats can affect our mood!)
• Mood Enhancer (ME).
• There are 30 students in the class, randomly allocate them into the three different
groups
• Group 1: control (no drugs)
• Group 2: ME1 (low level of ME drug)
• Group 3: ME2 (high level of Me drug)
• You hypothesize that the higher level of ME used, the greater the elevated mood:
• ME2 > ME1 > Control
• Your null hypothesis is that there will be no differences in mood across all three
Groups, and you set your level of significance at 0.05.
• There is a mood scale (1 (lousy)-10 (amazing))
Example detour:
Within-group variances SSWithin
- Textbooks describe this type of variance as naturally inherent or “random variation”
that exists among all individuals (this is why we never expect to get the same mean if we
draw two samples from a population).
- This variation can be caused by random error or perhaps issues related to
measurement.
- But even if our sampling is robust and our measures are valid/reliable, the ‘random’ or
‘inherent’ variation still exists.
The mood of Stat Students (N=3)
30

The ‘grand mean’ of these three groups is 7.09 with an SD of 0.74.


Between groups variances (SSbetween)
- Textbooks describe this type of variance as the variation between groups that may be
caused by an external systematic influence or the experimental use of different
treatments (the independent variable - i.e., treatment or intervention) in addition to the
‘random’ variance we discussed previously.
variance’… but we also see that the
mean scores seem to be increasing
with the increased level of the drug.
- Increases in scores may be due to
simple random variance, but also
they could be due to some added or
‘extra’ variance brought about by
the addition of the ME drug itself. If
this is the case, then it should be
reflected in the differences between
the group mean scores.
- KEY: What ANOVA does is ‘partition’
this added or ‘extra’ variance (or
noise) out. If a large amount of this
- Between groups variances SSbetween extra variance can be shown, then it
- Inherent within these scores is means that the changes in the mean
Within Groups, or ‘random scores are likely due to the ME drug
and not just ‘random variance’.
Total Variance:
- There is a total amount of variance in the whole sample (n=30).
o This total variance is:
o Within Subjects variance (SSWITHIN) + Between Subjects variance (SSBETWEEN)
31

F-RATIO:
- If the treatment (ME drug) has no effect, then the added or ‘extra’ variation will be zero
and the ratio should be equal to 1.0. This is called an F-ratio.

- A small F-ratio means that there will be no treatment effect of added or extra variance
and therefore not likely the cause of the differences between the means.
- If however, this ratio is much greater than 1.0, it will be because the treatment
variation, due to treatment alone, is large:

- Just like the t-ratio we saw with t-tests, ANOVA gives an F-ratio that can be described as
the “Mean Square Between Groups” variance divided by the “Mean Square Within
Subjects” variance:
- F-ratio = MSBETWEEN GROUPS
MSWITHIN GROUPS
o *The ‘Mean’ in ‘Mean Square’ means the average of the ‘between groups’ or
‘within groups’ variance
32

- Sum of squares (SS): variance calculated by subtracting the mean from each score in a
distribution of scores, squaring this difference, then adding all of the differences to
produce a total (or sum) of the squared deviations from the mean (SS).
- Mean square (MS): the mean or average variation. Calculated by dividing the total sum
of squares (SS) by the degrees of freedom.

- k-1: k is the number of groups being compared… ex., if 3 groups, then df = 2.


- N-k: total number in groups minus number of groups… ex., if 30 in all groups df = 27.
33

ANOVA Post-Hoc Tests:


- A one-way ANOVA will tell you if a difference between means exists, but if it exists, it
won’t tell you WHERE that difference exists.
o Group 1 - 2 ?
o Group 2 - 3 ?
o Group 1 - 3 ?
 To discern where the difference exists, one has to run a posthoc test
- SPSS gives a series of different types of posthoc tests.
- These are also called ‘pair-wise comparisons’; in essence, they compare all the different
pairs of tests to see which group means differ from each other.
- Bonferroni and Tukeys HSD are some of the more popular posthoc tests.
- SPSS offers 20+ different types of posthoc tests.
34

How would we report this?


Statistical Conclusion:
• “A one-way between-subjects ANOVA was conducted to compare the effect of levels of
a mood-enhancing drug (ME) on mood. There was a significant effect of levels of ME on
mood at the p<.05 level for the three groups [F (2, 27) = 24.7, p <0.001].”
• “Post hoc comparisons using the Tukey HSD test indicated that the mean score for high
ME drug (M=8.3; SD = 0.60) was significantly different (p=0.004) than the low ME drug
(M=7.6, SD = 0.37), as well as no ME drug (p<0.001; M=6.81, SD =0.45). Further, the low
ME drug was significantly higher than the no ME drug condition (p=0.005; M=7.6, SD =
0.37 vs. M=6.81, SD =0.45, respectively).
Research Conclusion:
• Taken together, these results suggest that the ME drug appears to increase mood levels.
Specifically, our results suggest that when people take this ME drug, their mood will be
higher than if they do not. Further, there seems to be a dosage level effect: a higher
dosage of the drug gave rise to higher moods than a lower dosage.
Repeated Measures ANOVA:
• Repeated measures ANOVA is used when a variable is measured three (3) or more
times, usually before, during, and/or after an intervention or treatment.
• For example, we might want to follow the differences in treating bipolar disorder over a
series of weeks using therapy or medication in the same group of individuals.
• In repeated measures, sample participants serve as their controls.
• Differences in means must be due to:
• the treatment
• variations within subjects
• error (unexplained variation)
• Repeated measures designs are more powerful than independent groups designs.
Factorial ANOVA (two-way ANOVA)
• So far, our ANOVA problems had only one dependent variable and one independent
variable (factor).
• e.g., comparing gas mileage across different brands of cars
• What if I want to use two or more independent variables?
35

• e.g., comparing gas mileage across different brands of cars and in different
provinces
• When we are examining the effect of two independent variables, this is called a Two-
Way ANOVA.
• An independent variable is something by association or by influence that can affect
(change) a dependent variable (the variable of interest).
• For example, studying for an exam is an independent variable, as studying can
influence the change in exam scores (i.e., lower levels of studying = lower exam
scores, higher levels of studying = higher exam scores).
• In ANOVA, the independent variables are called ‘FACTORS.’
• In a Two-way ANOVA, the effects of two factors can be investigated simultaneously.
• Two-way ANOVA permits the investigation of the effects of either factor alone (e.g., the
effect of the brand of car on the gas mileage, and the effect of tire type on the gas
mileage)
• BUT Two-way ANOVA also allows the investigation of the two factors together (e.g., the
combined effect of the model of the car AND the effect of state on gas mileage).
• E.g., Is there a difference in gas mileage when we combine both brands of car
AND type of tire used?
• This ability to look at both factors together is the advantage of a Two-Way ANOVA
compared to using two One-Way ANOVA’s (one for each factor) independently.
• The effect on the dependent variable that can be attributed to the levels of either factor
alone is called the MAIN EFFECT.
o Main Effect is what you would detect using two separate one-way ANOVA’s.

• But with Two-Way ANOVA you can detect both main effects separately (car brand and
tire type) as well as a combined effect (car x tire) called an INTERACTION EFFECT.
• This makes Two-Way ANOVA very efficient, but also more complicated to interpret.
Three questions are answered by a Two-way ANOVA with Interactions:
• Is there any effect of Factor A on the outcome?
• (Main Effect of A, or Car Brand on gas mileage)
• Is there any effect of Factor B on the outcome?
• (Main Effect of B, or Tire Type on gas mileage)
36

• Is there any effect of the interaction of Factor A and Factor B on the outcome?
• (Interaction Effect of Car x Tire TOGETHER on gas mileage)
• This means that we will have three sets of hypotheses, one set for each question.
Effect size and power analysis:
Effect size:
- Effect size is a quantitative measure of the magnitude of the effect.
- The larger the effect size, the stronger the relationship or greater the difference
between variables.
Statistical significance VS. meaningful significance:
- Sometimes one can have statistical significance in the findings that may not be as
meaningful in the real world.
- How large of a difference is important in theory or practice?
- Be wary of reports that state “we see a trend towards significance,” despite the findings
showing no significance.
Estimating effect size:
- Effect size (ES) represents the standardized difference between two means.
o ES = (M1 – M2)/ s
- Where M1 – M2 = differences between the mean scores
- Wheres = standard deviation (pooled)
o *Pooled = standard deviations are averaged together
- Like z-scores, the ES allows for comparison between studies using different dependent
variables because it puts data in standard deviation units.
o What do the effect size values mean?
o An effect size of:
 0 = no difference
 0.2 = small
 0.5 = medium
 0.8 = large
37

The effect size for t-test and ANOVA:


- For t-tests, the effect size used is called Cohen’s D:
o ES = (M1 – M2)/s
- For ANOVA, the effect size is called η² (Eta squared)
o η² = Treatment Sum of Squares
Total Sum of Squares
 Sum of squares = sum of the square of variation

Power analysis:
• Statistical power refers to the probability of a hypothesis test of finding an effect if there
is an effect to be found.
o The higher the statistical power, the higher the probability of detecting an effect when
there is one.
• A power analysis can be used to estimate the minimum sample size needed, given the
desired significance level, effect size, and statistical power.
38

t-test example:
Let’s say that you are wanting to compare two independent group means and you…
• Set your significance level at 0.05
• Want at least a medium effect size (i.e., d = 0.50)
• Want minimum statistical power of 0.80

- Example using G*Power (available to download here:


https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-
arbeitspsychologie/gpower)
- Note: Power of 0.8 is often the ‘default’ - this means that there is an 80% chance of
detecting the effect if it is there (and a 20% chance of making a Type II error)
RELATIONSHIPS AMONG VARIABLES:
- ‘Correlation’ is a term that is used often, other terms would be ‘relationship’,
‘association’, or ‘covary’.
- For example A researcher obtains measurements on two variables for a single group of
subjects and is interested in how these variables are related to each other.
o E.g., Household income and education
o E.g., Aerobic exercise and % body fat
o E.g., Level of medication and presenting symptoms
39

WHAT CORRELATIONS CAN AND CAN NOT DO


- Correlation describes the relationship between two variables, such that the researcher
can conclude the nature of that relationship
- The correlations coefficient (r) presents a numerical value for these relationships
o r = 0.72, p<0.05
- Correlations are fairly easy to understand on the surface.
- But sometimes people get tripped up when they assume that correlation implies
causation.
- Correlation tells us that a relationship exists between two variables and may indeed be
causal (i.e., the increase in one direction causes an increase in another), but this may
not be the case.
- A relationship may exist between hospitals and death, but other factors might be
contributing to the actual cause of death.
SCATTER PLOTS:
- A scatterplot shows the relationship between two quantitative variables measured.
- The values of one variable appear on the horizontal axis (X) and the values of the other
variable appear on the vertical axis (Y).
- Each individual in the data appears as the point in the plot fixed by the values of both
variables for that individual.
o rx,y
 The ‘x’ and ‘y’ refer to the individual variables being considered.
- Conventionally, the dependent variable is on the vertical axis, whereas the independent
variable (explanatory) is on the horizontal axis.
CORRELATION COEFFICIENT:
- The underlying principle with correlation coefficients is the concept of ‘covariance.’
- Covariance might be most intuitively explained as a measure of how much two
variables change together.
o Example: In studying social patterns, you might hypothesize that wealthier
people are likely to be more educated, so you'd try to see how closely measures
of wealth and education stay together (or ‘co-vary’).
o We expect that a large X is associated with a large Y, a small X with a small Y, etc.
 Therefore, X and Y are said to ‘co-vary’
 i.e., they vary in similar patterns
o If X and Y aren't closely related to each other, they don't co-vary, so the
correlation is small.
CORRELATION:
- rXY = correlation between X and Y
40

- Examines the relationship between variables with values ranging from:


o -1.0 to +1.0
- How the value of one variable changes about changes in another variable
- Bivariate correlation (2 variables).
Positive Correlation (r is positive)
- Direct correlation
- When one variable increases the other variable also increases
Negative Correlation (r is negative)
- Indirect correlation
- When one variable increases and the other decreases
- The most used correlation coefficient is the Pearson product-moment correlation
coefficient
o The type of variable that Pearson works with is interval/ratio (continuous)
- Spearman correlation coefficient is another common type
o The type of variable that Spearman works with is ordinal
41

STRENGTH AND STATISTICAL SIGNIFICANCE:


(rule of thumb)

• The sample results may be due to chance and the relationship in the population is not
strong or is zero.
o The null hypothesis for correlations is that r = 0

• When we interpret a correlation statistic, we first look at the p-value to see whether or
not it is statistically significant EVEN IF THE r-value IS LARGE.
• If the correlation is not statistically significant, we state there is no relationship or
association between the variables.
42

COEFFICIENT OF DETERMINATIONS (r2 )


• The ‘true’ indicator of the strength of the association
• Definition: The amount of variance in one measure that is explained by the other
measure.
• i.e., the percentage of the total variance in the Y scores can be explained by the
X scores
• e.g., r = 0.91 between height and foot length scores
r2 = coefficient of determination will be 0.83
• This means that 83% of the variability in foot length scores is due to or can be
explained by the different height scores.
• i.e., we have 83% of the information necessary to make an accurate prediction
SUMMARY:
• Relationships can exist between two variables.
• The correlation coefficient can quantify the strength of the relationship between two
variables.
• Scatterplot graphs visually show the relationships between the variables.
• The Pearson product-moment correlation coefficient will quantify interval/ratio
(continuous) variables.
• The correlation coefficient varies between -1.0 and 1.0; however, it is the absolute
value of r that determines the strength of the relationship.
• The coefficient of determination (r2) is the true indication of the relationship.
REGRESSION:
Intro to REGRESSION:
- Correlation helps to explain or predict the change in one variable based on the change
in another, and part of that change is due to the amount of ‘shared variance’ between
the variables. The more shared variance (i.e., the larger the r2 value), the greater the
explanatory or predictive power of one variable on the other.
We also noted:
1. 100% of the variance is never shared - there is always a certain percentage of variance
that is leftover and ‘unexplained.’
43

• Example: let’s say we found that intelligence scores explained 50% of income… what
other factor(s) might account for the ‘left over’ 50% change in income?
2. Several factors at once may influence the outcome of a variable of interest.
• Example: intelligence might be related to income, but income is likely. Related to
other things (age, gender, geographic location, years of experience, etc.)
• This is where regression comes in…
• Regression involves the development of explanatory or predictive models wherein
one (or more) variable(s) can explain changes in another variable of interest.
• There are different forms of regression: multiple, hierarchical, logistics, ordinal, etc.
We’re going to start by learning about simple linear regression and the build to multiple
regression.
USES OF REGRESSION
• Amount of change in a dependent (Y) variable that results from changes in the
independent variable(s) (X).
• Regression attempts to determine or explain the causes of phenomena.
• Prediction and forecasting (medical and health outcomes, etc.)
• Support or negate theoretical models.
• Modify and improve theoretical models and explanations of phenomena.
REGRESSION MODEL:
• Relation between variables where changes in some variables may “explain” or possibly
“cause” changes in other variables.
• Explanatory variables are termed the independent variables and the variables to be
explained are termed the dependent variable:
o Dependent variable (Y) – ‘depends’ on the independent variable (the ‘X’ variable).

o The Independent variable (X) – influences the dependent variable (Y); controlled by
the researcher.
- Regression model estimates the nature of the relationships between the independent
and dependent variables
 XY
o Change in dependent variables that results inform changes in independent
variables (size of the relationship)
44

o Strength of the relationship


o Statistical significance of the relationship

EXAMPLES:
Example 1:
Simple Linear Regression (one independent variable, one dependent variable)
- Let’s say I want to know what the price of gasoline is going to be when the price of
crude oil goes up…
o Dependent variable: retail price of gasoline in Vancouver
o Independent variable: the price of crude oil
 If crude oil prices (independent) go up (a lot or a little), what does that do
to gas prices (dependent variable)?
Example 2:
Multiple Regression (multiple independent variables, one dependent variable)
- Let’s say I want to know the employment income given certain factors (e.g., hours of
work, education, occupation… etc.)
o Dependent variable: employment income
o Independent variables: hours of work, education, occupation, sex, age, region,
years of experience, unionization status, etc.
Bivariate and multivariate models

LINEAR REGRESSION:
45

- The regression line models the relationship between X and Y on average


- Y = the dependent or ‘response’ variable
- X = the independent or ‘explanatory’ variable (slope)

(THIS IS RISE OVER RUN)


• x is the independent variable
• y is the dependent variable
• β 1 is the slope
• a is the y-intercept (where the regression line crosses the y-axis)
The regression model consists of two parameters:
o The slope of the line (β1) and the y-intercept (a)

o To estimate y

EXAMPLE:
46
47

• y= βx + a describes an imperfect linear relationship between two variables: the value of


y is not completely determined by x, but is also affected by the “random error” (we’ve
seen this random error before in ANOVA!)
• In regression modeling, we try to fit a line through our scatter plot data, so that the
(sum of squared) errors are minimized.
• The straight line (line of best fit) obtained can then be used for interpretation of the
relationship and explanations/predictions.

LINEAR REGRESSION AND HYPOTHESIS TESTING:


Linear regression and Hypothesis testing:
• Equation for linear regression: y= βx + a
• The slope of the population is used for hypothesis testing:
• Testing with population slope being zero in the population data (i.e., testing the
hypothesis that there is no relationship between x and y)
• In other words, the null hypothesis can be conceptualized as a slope that
is zero
• We are interested in the slope of the line (β) because it indicates the amount of change
in the dependent variable (Y) that we would expect for a change of one uint In the
48

variable (x).
• “It is most important to know whether the value we have for the slope is statistically
significant; if significant, then the value of the slope in the whole population
(remember we are testing a sample here) can be considered different from zero.”
• In other words, we want to know: is the slope that we have (of our model, β1)
different enough from zero to the degree that we are confident that the
explained/predicted relationship between x and y is not due to chance?

When there is a single independent variable (as is the case in simple linear regression), the
standardized beta (β) coefficient is equal to the Pearson correlation coefficient.

Y = βx + a
Y=mx +b
Where β = Pearson correlation coefficient

HYPOTHESIS TESTING
• Error of Estimate: how much each data point differs from the predicted data point
(regression line).
• Standard error of estimate (SEE): the measure of how much each data point (on
average) differs from the predicted data point (the regression line), or a standard
49

deviation of all the error scores.


• The higher the correlation between two variables (and the better the prediction), the
lower the error will be.
Method of Ordinary Least Squares (OLS)
- OLS is used to determine the “best” line that is as close as possible to the data points in
the vertical (y) direction (since that is what we are trying to predict)
- Least Squares: Method that finds the line that minimizes the sum of the squares of the
vertical distances between the individual data points and the regression line

*The sun of squared differences can never be in the negatives. (And hypothetically could be
zero, but there is always some sort of error in the real world*
The Influence of Outliers:
- Outliers make the slope become bigger – the line gets pulled towards the outlier
- The r value becomes smaller (less linear) – it changes the whole equation
Example: Aerobic Capacity and Running Performance
- How fast (time) a person can run a 10km race is dependent in part on their aerobic
capacity (VO2 max).
o Can we predict 10 km running times using VO2 max values?
o Variables: Time (Y) and VO2 (x).

How long will it take a person to run 10 km if their VO2 max is 60 ml/kg/min?
 Goal: to predict the length of time it takes to run 10 km for a given VO2
max.
50

Scatterplot of response variable (y) against


explanatory variable (x)
• What is the overall (average)
pattern?
• What is the direction of the pattern?
• How much do data points vary from
the overall (average) pattern?
• Any potential outliers?
Simple Conclusions:
• Time is linearly related to VO2.
• Time decreases as VO2 increases.
• Data points are close to the line.
• No potential outlier.

Predicted value of Y for a given X value


Regression equation: Y = -0.7402X + 92.464
E.g., How long will it take to run a 10 km
race if one’s VO2 max is 42 ml/kg/min?
Where X = VO2 max value

If we know a person’s VO2 max, we can predict their 10 km run time:


E.g., VO2 max of 42 ml/kg/min
Y = -0.7402(42) + 92.464
51

Y = -31.09 + 92.464
Y = 61.8 min
- Based on the model we can estimate that a person with a VO2 max of 42 ml/kg/min will
be able to run a 10 km race in 61.8 minutes.
SUMMARY:
• Regression models allow for the explanation and prediction of outcome variables
(dependent variables of interest) using either one, or more than one independent
variable.
• Regression models can be qualitatively described as:
DATA = MODEL + ERROR
• The regression line superimposed onto a scatterplot of two variables (simple linear
regression) IS the MODEL (Y= βx +a).
• The slope of this regression line in SIMPLE LINEAR REGRESSION is the Pearson
correlation coefficient – this is where we see connections between Chapter 15
Correlation and Chapter 16 Regression.
• SPSS and other statistical programs will select the ‘best fitting model’ based on the
smallest amount of total variance that exists between individual data points and the
regression line itself.
• This best fitting model is based on OLS (ordinary least squares) mathematics.
• In general, regression models are statistically significant if the total OLS variance with
the fitted model is less than the total variance from the null hypothesis model.

REGRESSION…CONTINUNED
MULTIPLE REGRESSION:
- What do we do when we have more then one predictor/independent variable?
o E.g., predicting scores on a memory task from measures of short-term memory,
ready comprehension, and processing speed.
o We use MULTIPLE REGRESSION ANALYSIS
 Y = bX1 + bX2 + a
o Where:
 X1 is the value of the first independent variable
 X2 is the value of the second independent variable
 b is the regression weight for that particular variable
52

 a is the y-intercept of the repression line


Example:
- Let’s say you run an experiment investigating whether hours of studying per week, class
attendance and coffee drinking have an effect on final exam scores in PSYC 207.
o You recruit 20 students and collect:
 Average number of hours studying over the semester – independent
variable
 Total number of classes attended – independent variable
 Whether or not they drank coffee – independent variable
 Final exam scores – dependent variable
In SPSS…
• Click ‘Analyze’
• Select ‘Regression’ and then ‘Linear’ (for linear regression)
• Click ‘OK’
• Dialogue box will open:
o Move dependent/outcome variable into the ‘Dependent’ box

o Move the rest of the variables into the ‘Independent’ box

o Press the ‘Statistics’ button, select ‘Estimates,’ ‘Confidence Intervals,’ ‘Model fit,’ ‘R-
Squared change,’ and ‘Descriptives,’ then click ‘Continue’
o Click ‘OK’

Now, how to interpret them…


- Take a look at the ‘ANOVA’ table:
o This includes the results from testing our regression model against the null
hypothesis.
o Sig. value - gives us our p-value for the model (if <.05, then statistically
significant).
53

the sig. value is


2. Take a look at ‘Model Summary’ table:
• Provides information about the model fit.
• Sig. F Change value - gives us our p-value for the model (if <.05, then statistically
significant).
• R Square value - this value indicates how much of the variance in the dependent
variable is explained by the model.

Note about Sig. F Change value…


R Square (R2) - the proportion of variance in the dependent variable which can be explained by
the independent variables. However, this is an overall measure of the strength of association
and does not reflect the extent to which any particular independent variable is associated with
the dependent variable.
The Adjusted R2 - a more conservative assessment of explained variance and the one that is
usually reported in research reports.
Std. Error of the Estimate (SEE) - indicates the total amount of error in the model. A higher SEE
means more error and a lower SEE means less. Remember that no model is perfect; there is
always error. Generally, as R2 increases the SEE will decrease, indicating less error in the
model.
Interpreting the fit of the model:
• Taken together, both the R2 value and the SEE describe the ‘fit’ of the model:
o A model with a larger R2 and a small SEE = better fitting model of the data.
54

o A model with a small R2 and a larger SEE is indicative of a model that doesn’t fit the
data as well.
3. Take a look at the ‘Coefficients’ table:
• Provides information about the contribution of each independent variable to the model;
in other words, provides information about how strongly each independent variable
influenced the dependent variable.
• First, take a look at the ‘Sig.’ column:
o For each of the variables, check their significance level – this tells you which
variable(s) make a statistically significant contribution to the model (if <.05)
• Next, take a look at the ‘Beta’ column under ‘Standardized Coefficients:’
o For each of the variables, take a look at the beta value – this tells you how much of a
unique contribution each variable makes to the model.
o Larger values = greater unique contribution to the model.

o +/- indicates the direction of the relationship between the independent variable and
dependent variable (i.e., positive beta value = direct relationship).

B - These are the values for the regression equation for predicting the dependent variable from
the independent variable.
β are Beta-standardized coefficients. Standardizing the variables before running the regression
puts all of the variables on the same scale (kind of like what we did with z-scores comparing
math and verbal scores – we took the math and verbal scores and standardized them to z-
scores to compare).

NOW WRITING A STATISTICAL AND RESEARCH CONCULSIONS FOR REGRESSION:


55

“Findings show that hours of study and classes attended positively influenced final exam scores.
For every one point increase in hours studied, final exam scores would increase by 2.403 points.
Similarly, for every one point increase in number of classes attended, final exam scores would
improve by 2.546 points. These findings indicate that weekly studying and regular attendance
of classes helps improve final exam scores for students in PSYC 207.”

NON-PARAMETRIC TEST:
Introduction to non-parametric tests:
- The term “non-parametric” refers to the fact that these tests do not require
assumptions about population parameters
o Nor do they test hypotheses about population parameters
 Previous examples of hypothesis test, such as the t-test and ANOVA, are
parametric test and they do include assumptions about parameters and
hypotheses about parameters
- Parametric tests have different assumptions of nonparametric tests:
o Variance of each group are similar
o Sample is large enough represent the population
- Nonparametric statistics don’t require the same assumptions
1. They are distribution-free
2. Allow data expressed as frequencies to be analyzed
Examples of non-parametric tests
Mann-Whitney U
56

- Non-parametric ‘version’ of the t-test


- Use a t-test when the data is being distributed. (Looking for differences between 2
groups)
- Is used to compare differences between two independent groups when the dependent
variable is either ordinal or continuous, and not normally distributed.
- Do attitudes towards pay discrimination (attitudes measured on an ordinal scale) differ
based on gender?
o Dependent variable: ‘attitude’ (Likert scale) ordinal
o Independent variables: ‘male’ and ‘female’
- Do salaries (continuous scale) differ based on educational level (high school and
university)
o Dependent variable: salary
o Independent variables: high school education only, university education

Kruskal-Wallis
- Non-parametric ‘version’ of One-Way ANOVA
- Use when you have different levels/groups (looking for differences between more then
2 groups)
- Determine if there are statistically significant differences between two or more groups
of an independent variable on a continuous or ordinal dependent variable.
- It is considered the nonparametric alternative to the one-way ANOVA, and an extension
of the Mann-Whitney U test, to allow the comparison of more than two independent
groups.
- Does exam performance (continuous scale from 0-100) differ based on test anxiety
levels?
o Dependent variable: Exam score
o Independent variables: Anxiety levels (low, medium, high)
- Do salaries (ranked, ordinal) differ based on an independent variable (e.g., job type) ?
o Dependent variable: <$20,000, $20,000-49,999, >$50,000
o Independent variable: salaried, hourly, contract

Chi-Square test
- Sometimes researchers are interested in evaluating whether a number of cases in
specific categories are different based on what would be expected by some basis of
chance or some other form of known information (e.g., census).
- Chi-square provides a statistical test of the significance of the discrepancy between the
observed and expected results.
- The difference between the Chi-square tests and the other hypothesis tests we have
considered (t and ANOVA) is the nature of the data.
o For t-tests and ANOVA, data are continuous (i.e., interval/ratio variables).
57

o For chi-square tests, data are frequencies (i.e., how frequent does a score
occur?).
- Chi-square allows you to determine if what you observe in a distribution of frequencies
is what you would expect to occur by chance.
- One-sample chi-square is also called the “Goodness of Fit test.”
o Note: how this is in line with what we’ve been learning throughout this course
(e.g., is the result due to chance, or due to a particular intervention?)
- The chi-square test for goodness-of-fit uses frequency data from a sample to test
hypotheses about the shape or proportions of a population.
- Each individual in the sample is classified into one category on the scale of
measurement.
- The data, called observed frequencies, simply count how many individuals from the
sample are in each category.
- The null hypothesis specifies the proportion of the population that should be in each
category.
- The proportions (%) from the null hypothesis are used to compute expected
frequencies that describe how the sample would appear if it were in perfect agreement
with the null hypothesis.
- Null hypothesis:
o H 0 : P1 = P2 = P3
- Research hypothesis:
o H 1 : P1  P 2  P 3
- *Where P = proportion (%)
COMPUTING CHI-SQUARE:
Χ2 = Chi-square statistic
∑ = sum of
O = observed frequencies (our actual data)
E = expected frequencies – which should be
there if the null hypothesis holds true…
CHI-SQUARE EXAMPLE:
 Before we work through examples of Chi-square tests, we have to address the question
of calculating the ‘expected’ frequencies.
 This is actually part of the research hypothesis:

o The researcher may hypothesize what the expected frequency of a variable is by


drawing upon some standard already available in the literature.
58

 We will see how the calculation of expected frequencies plays out for one-sample and
two-sample chi-square tests.
One-Sample Chi-Square Example
 From Salkind (p. 303)

 Research question: Are respondents (n=90) equally distributed for their preference for a
food voucher?
 For voucher (n=23)

 Maybe for voucher (n=17)

 Against voucher (n=50)

Research question: are respondents (n=90) equally distributed for their preference for a food
voucher?
 For (n=23)

 Maybe (n=17)

 Against (n=50)

 If they were equally distributed, then, by chance we would expect something else than
the proportions presented in the data.
 For one-sample Chi-square tests, we calculate the expected frequencies by dividing the
total (90 responses) by the number of groups (3). In this case the expected frequency
for each group would be 30.

How to interpret:
X2(2) = 20.6, p < .05
59

• x2 represents the test statistic (calculated value)


• 2 is the number of degrees of freedom
• 20.6 is the obtained value
• p < .05 is the probability
Example of SPSS output using same data we
used for the previous example

• Chi-square is usually applicable for


small samples.
• Categories must be independent of
one another.
• The expected cell frequency for any
cell should not be less than 1.
• For a 2 X 2 contingency table, the
total number should be no less than
20 (n=20).

You might also like