You are on page 1of 41

Psychology 117 Study Guide

Statistics: A set of methods and rules for organizing, summarizing, and interpreting
information. (also creates a standard for comparison)

Facts and figures that condense large quantities of information into a few simple figures
or statements (because people can understand averages!)
• Methods
• Interpretation

Population: set of all the individuals of interest in a particular study

Sample: individuals selected from the population as a representative sample

Parameter: A numerical measurement describing some characteristic of a population

(greek symbol)

Statistic: A numerical measurement describing some characteristic of a sample

Data – measurements or observations

Data set: collection of measurements or observations

Datum: a single measurement or observation (called score or raw score)

Datum is observation that we collect from individual (called raw score). The complete
set is the Data set and after we get these we run analyses.

Two Categories Statistical procedures

Descriptive Statistics: a set of methods and rules for organizing and summarizing
(mean, median, mode, graphs)

Inferential Statistics: a set of methods and rules for interpreting information

Or techniques that allow generalizations from a sample to a population from
which they were selected.

Sampling Error: sample data isn’t always perfect. This discrepancy is called sampling
error – the amount of error that exists between a sample statistic and the corresponding
population parameter.

Scientific method and the Design of Research Studies

Objectivity: studies should be conducted in a way to prevent biases from influencing the
outcome of research (Rosenthal 1963 with rats).
Relationships Between Variables: The GOAL is to find relationships between
variables and put order to the universe.

The GOAL is to find relationships between variables and put order to the universe.

Variable: a characteristic or condition that changes or has different values for different
people (x,y)

Constant: A characteristic or condition that does not vary, but is the same for every
individual (adding 4 pts to everyone’s exam)

• The two methods for observing and investigating variables are…

o Correlational
o Experimental

Correlational Method: looking at observations/variables in their natural environment,

we aren’t manipulating anything! studies that observe and measure specific
characteristics without attempting to modify the subjects being studied and instead
observe the variables as they naturally exist (questionnaires, interviews)
PROBLEM is figuring out cause and effect, it’s just a relationship, confounding

Experimental Method: Studies that modify a group of subjects in some way in an

attempt to establish a cause and effect relationship (lab treatment vs placebo)
• Two characteristics to show that changes in one variable are caused by changes in
the other variable:
o CONTROL: Manipulate independent variable (whether or not get pill) and
see what it does to the dependent variable (change in condition)
o RANDOM ASSIGNMENT: each subject has an equal chance of being in
either condition
o HOLDING CONSTANT: method of controlling other variables that might
influence results. Each condition must be identical except for the variable
that is being manipulated.

Measuring Data

Qualitative: categorizing events

Quantitative: using numbers to categorize the size of the event

Four Levels:

Nominal: data that is names, labels, categories, cannot be arranged in scheme (low 
high) ex: gender, SS#, favorite sports team, etc
Ordinal set of categories organized in an ordered sequence – differences in data values
cannot be determined or are meaningless (RANK ORDER) (course grades, rankings,
stove settings)

Interval: consists of ordered categories where all of the categories are intervals of the
exact same size, BUT there is no natural ZERO starting point. (temperature, dates, etc)

Ratio: same as the interval scale modified to include the natural zero starting point
(weight and prices)

Quantitative Data (Classifying)

Discrete: separate, indivisible categories (counting something, can’t have 1.5)

Continuous: infinite number of possible values that fall between any 2 observed values –
is divisible into an infinite number of fractional parts (running water, time)

Describing Data: we use descriptive statistics

Frequency Distribution Table

• Lists different measurement categories or X values in a column from highest to

lowest, and beside each X is the number of times each one occurs.

Frequency Distribution Graph

• Use when frequency distribution has data from an interval or ratio scale (draw bar
above X value so height = frequency of score)
• Adjacent bars touch eachother, continuous figure

• Used when presenting frequency distribution data from ordinal or nominal scales
• Space between bars to emphasize distinct, discrete categories

Stem and Leaf Plot

• Represent data by splitting value into 2 parts
o Stem – leftmost digit
o Leaf – rightmost digit

Describing a Distribution
• Shape
o Symmetrical
 Kurtosis = peakedness, HUGE peak and tails are extreme
o Skewed
 scores pile up on one end and taper off of other
 positively skewed: tail to left
 negatively skewed: tail to the right
• Central Tendency – concept of average. the easiest way to describe scores –
compute an AVERAGE!
o Mean, median, mode
o Goal is to obtain a single value that IDs a single score as representative of
the entire distribution
• Variability

Mean is preferred method if possible to use

Mode used with nominal scales, it’s easy to compute which one shows up most often
Median used when there are a few extreme scores in the distribution (outliers) or ordinal
data that is harder to measure,
• Extreme scores
• Ordinal scale
• Open-ended distributions
• Undetermined values

Relationship between three:

For symmetrical distribution: all three in the middle

For skewed, median is in middle, mode is at bump, and mean is closer to tail

Measures of Variability

Range (Max – min) insensitive because it’s determined by the extreme values
Standard deviation/variance – approximates average distance from mean
• Standard deviation is a descriptive measure that describes how variable, how
spread out, the scores are in a distribution

Three characteristics that describe a distribution:

1) Shape (skewed or symmetrical)
2) Central tendency (mean, median, mode)
3) Variability (range, standard deviation)

Steps to Find Standard Deviation

1) Find the mean of the data (add up all the values, divide by number)
2) Determine the deviation for EACH SCORE from the mean (x-Mu)
3) Square these values
4) Then find the average of these squared values – The numerator is SS, the sum of
squares, and then we divide by N. this is VARIANCE
Variance = (X - Mu)2 / N

So mean squared deviation = SS/N

5) To get the standard deviation, we just have to take square root of the variance

To ways of calculating SS, the sum of squares

1) Definitional formula:
a. Subtract the mean from each value, square it, then add them all up = definitional
2) Computational formula:
a. Square each value, add up the squares, then SUBTRACT [addition of all the
values, squared, then divided by number of values] = computational SS
The problem with populations vs samples is that samples tend to be less variable than populations
– so we have to adjust for this bias (which can either be an overestimation or an underestimation)

So now we are calculating standard deviation of a SAMPLE instead of a POPULATION

X bar is used instead of Mu (for sample mean over population mean)

We do the same process, but instead of N we use n-1 in the denominator to adjust for the bias

n-1 = degrees of freedom

Measures of Relative Standing

Mean and standard deviation describe an entire distribution

Z scores (standard scores) describe an individual score, using a mean and standard deviation,
changes X into Z score, to standardize scores

Z Scores describe the precise location of a score within a distribution

• Changes each x value into a signed number (+ or -)
o Positive is above the mean
o Negative is below the mean
• Number tells the distance between the score and the mean in terms of the number of
standard deviations

Numerator is deviation score (above or below the mean)

Denominator is standard deviation so Z score is in SD units

Purposes of Z Scores

1) Transformation (the Z scores)

2) Standardized an entire distribution (takes different distributions and makes them

If every x value in a distribution was transformed into Z scores, would have following

1) Shape – stays the SAME as the original distribution of raw scores

2) Mean – Z score distribution mean is ALWAYS zero
3) Standard deviation – z score distribution will always have a SD = 1

• Standardized distribution: Every Z score distribution has the same mean (ZERO) and
the same SD (ONE)
• When two scores are from different distributions, it is impossible to make any direct
comparison between them
o To calculate we need to raw score, mean and standard deviation of each
distribution you are comparing
o We can compare Z scores because the z scores are coming from an equivalent
standardized distribution

Two steps involved in standardizing a distribution so that it has a predetermined mean and
• Each of raw scores is transformed into Z score
• Each of the Z scores is transformed into a new X value so that a particular mean and SD
are achieved

Standard score = new mean + Z(SDnew)


How do I make an inference about a population?

1) Develop probability as a bridge from population to sample

2) Reverse the probability rules

Probability of A = number of outcomes of A

Total # possible outcomes

P(specific outcome)
Event never occurs then p = 0
Certain to occur p = 1

• Random sampling:
o Each person in population has equal chance being selected (1/N)
o If more than one individuals selected, must be a constant probability for
each and every selection
o Aka, don’t put the card back

Sampling with replacement

• Aka putting the card back

Probability and standard distribution

1st column: Z score values

2nd column: % mean to Z is percentage of scores between mean and z score
3rd column: % in tail gives the percentage of scores in the tail for that z score
Probability and Samples: The distribution of sample means

• Every score from a population has a Z score – says WHERE that score is located
in the distribution
o We want to be able to transform sample mean into Z scores too, so n can
be more than just one value
o NOW a Z score represents an entire sample
 Beyond + or – 2 means an extreme sample

Problems with Samples

• Sampling Error: not all samples from the population will be representative of pop
• Variability: different samples will vary. We need RULES to relate samples to

Distribution of samples means: basis for ability to predict sampled characteristics – the
collection of sample means for all the possible random samples of a particular size (n)
that can be obtained from a population

1) Select a random sample and calculate the mean

2) Do this for entire set of all possible random samples – this is the distribution of
sample means

Sampling distribution: is a sampling/distribution of statistics (those means of different

samples) obtained by selecting ALL the possible samples of a specific size from a

• If we have a normal distribution we know that we can calculate Z scores and

answer probability questions
Sample vs population mean…
• Sample means tend to pile up around population mean
• Distribution of sample mean is normal in shape
• If we have a normal distribution we know that we can calculate Z scores and
answer probability questions

• It isn’t possible to list all samples and calculate every possible mean
• So we have general characteristics of the distribution of sample means, called
central limit theorum

Central Limit Theorum

for any population with a mean () and SD, the distribution sample means for a sample
size n will approach a normal distribution with a mean of () and a SD of

and will approach a normal distribution as n approaches infinity

• Describes the distribution of sample means for any population, no matter what
shape, mean, or SD.
• Distribution of normal means “approaches” a normal distribution very rapidly
o by the time n= 30, the distribution is almost perfectly normal

**and remember that any distribution can be defined by shape, varability, and central

• distribution of sample means tends to be normal distribution if one of following is
1) population is a normal distribution
2) number of scores is large or greater than 30 (regardless of shape)


• expected value of X bar is the mean of the distribution of sample means, and it’s
always = to population mean.

VARIABILITY  standard deviation

• SD for the distribution is called the standard error of the X

o Measures how much difference should be expected on average between
Xbar and mean
o Standard error tells us how good an estimate sample is compared to
population mean

Standard Error:
• Magnitude determined by:
1) size of the sample – Law of Large numbers  smaller error margin with larger
2) Standard deviation is starting point for standard error, so when n =1, Standard
deviation = standard error. As sample size increases, standard error increases
with relation to n.

Sampling Error: knowing that samples aren’t completely representative of a population

Standard Error: measuring the error between the sample mean and the population mean
** method by which we measure sampling error

Standard Error vs standard deviation

• Standard deviation
o Use when working with a distribution of scores and it measures the standard
distance between a score and the population mean

• Standard Error
• Use when you have a question concerning a sample, and it measures the standard
distance between the sample mean and the population mean

Hypothesis Testing: a statistical procedure (the most commonly used) that lets us make
inferences about a population using sample data. (we use Z scores, probability, and distribution of
sample means)

Hypothesis Test: a statistical method that uses sample data to evaluate a hypothesis about a
population parameter

• Testing Procedure
• 1) State a hypothesis about a population (usually population parameter)
• 2) Obtain a random sample from the population
• 3) Compare sample data with hypothesis

Example: using one unknown population and one sample

• We know a mean, standard deviation, and sample size, n
• We assume if treatment (what we are measuring) has any affect then it will be adding or
subtracting a constant from the mean, so SD should be the same

• Testing Procedure

1) State a hypothesis
• Null Hypothesis (Ho) = treatment has no effect! There is no change
• Independent variable (treatment) has no effect on dependent
variable for population
• Scientific or Alternate Hypothesis (H1) = treatment does have an effect
• There is SOME TYPE OF CHANGE! The independent variable
(treatment) will have an affect on the dependent variable.
• Non-directional Test: this hypothesis does not specify direction
of change

2) Set the criteria for a decision
• Alpha Level / Level of Significance: probability value used to define
the very unlikely sample outcomes if the null hypothesis is true.
• Determines boundaries of critical region
• Defines “unlikely” outcomes
• Must select level to minimize Type I error
• Largest permissible value is .05 (5%)
• Tradeoff between lowering alpha level and difficulty of finding a
treatment effect (usually use .05, .01, and .001 to maintain
• Critical Region (tails) extreme values that are very unlikely to be
obtained if the null hypothesis is true (so if we see values in the critical
region, it is likely that null hypothesis is NOT true – the treatment DID
have an affect). Boundaries are determined by alpha level.
• Sample data fall in critical region  null hypothesis rejected

3) Compute Z Score for sample mean

Z = sample mean – hypothesized population mean

Standard error between X bar and pop mean

We are computing a z score that describes exactly where the sample mean is located relative to
the hypothesized population mean from Ho

4) Make a decision

- Reject the null hypothesis (treatment did have an affect)

o so there WAS a change – data is in the critical region, or the tails)
- Fail to reject the null hypothesis (treatment did not have an affect)
o Or we don’t have enough evidence that treatment has an effect, the evidence
wasn’t convincing
o Data are not in the critical region

So if treatment WORKS: data is in the critical region: we REJECT null hypothesis

If treatment DOES NOT WORK: data is NOT in the critical region, we FAIL to reject the NH

So in terms of Z scores:
• A Z score near 0 means we aren’t in critical region – hypothesis was not correct
• A Z score extremely positive or negative also means that hypothesis was wrong

Z Score Formula

• Z = obtained difference
Difference due to chance
Is the result of the research study (obtained difference) more than would be expected by chance

Most hypothesis tests require that the obtained difference be 2 to 3 times bigger than chance
before the research will reject the null hypothesis (and say that the treatment did cause a
Uncertainty and Errors in Hypothesis Testing

Inferential Process:
• Always possibility that an incorrect conclusion was made
• Type 1 Error; when researcher rejects null hypothesis, says that treatment was
effective, and in fact it is not
• Unlikely to obtain sample mean in critical region when null hypothesis is
true. Critical region is determined by alpha level, so alpha level for
hypothesis test is equal to probability that test will lead to Type 1 error.

Bigger alpha level = bigger critical region = more likely have sample means in that area = more
likely to reject null hypothesis and say treatment was effective. So a smaller alpha level is more
fine tuned, if we reject the null hypothesis with a smaller alpha level and say treatment was
effective, this is more reliable.

• Type 2 Error: when a researcher fails to reject a null hypothesis that is really
false. So researcher fails to reject a null hypothesis, says that treatment isn’t
effective, and it is? Not as serious, and just means that research data doesn’t
show results that researched had hoped to obtain.
• Is a function represented by Beta symbol

Statistical Test Notation

Significance: means that the result is different from what would be expected due to chance
Findings are significant when the null hypothesis is rejected

In papers: “treatment with medication had a significant effect on people’s depression scores, z =
3.85, p<.05
• Significant means that we rejected the null hypothesis, as a sample mean fell in the
critical region
• Z = 3.85 is the Z score for the sample mean
• P < .05 is the alpha used for the test, meaning that there must be a less than 5%
probability that the finding is due to chance alone
• So researching is 95% confident that the obtained difference is greater than what
one would expect by chance alone

Assumptions for Hypothesis Test with Z Scores

1) Random sampling
2) Independent Observations (no predictible relatinoship between 1st/2nd observ)
3) The value of SD is unchanged by the treatment (we make assumption)
4) Normal sampling distribution
T Statistic

• Shortfall of Z score test is that it usually involves more information than is available
• Population standard deviation or variance
• We can’t calculate standard error without this!

When do we use the T statistic? When the population SD is not known. When the variability
for the population is not known, we use the sample variability in its place

So Estimated Standard error : used to estimate the real standard error in situations when the SD
is unknown. Gives estimate of the standard distance between a sample mean of X bar and the
population mean (Mu)


Sample Standard deviation  descriptive statistics

Sample Variance  inferential statistics
T Statistic – result of substituting the estimated standard error in the denominator of the Z score

Only difference between t and z formula is that z score formula uses the actual population
variance, and t formula uses the sample variance (because we don’t know the population one)

How well does sample variance approximate the population variance? Lets us know how
well a t statistic approximates a z score.

• Df = degrees of freedom = n -1
• Sample mean places a restriction on the value of one score
• GREATER value of df is for a sample, better sample variance represents population
variance  better t statistic approximates the Z score
T Distributions

• Every sample from a population can be used to compute a z score or a t statistic

• T distribution will approximate a normal distribution in the same way that a t statistic
approximates a z score
• How well it approximates is determined by degrees of freedom
• GREATER the sample size (n), then LARGER degrees of freedom (n-1) and the
better the t distribution approximates the normal distribution

• Shape changes with degrees of freedom – there is a different sampling distribution of t
for each possible number of degrees of freedom
• As df gets very large, the t distribution gets closer in the shape to a normal z-score

Standard error isn’t constant because it is an ESTIMATE based on standard variance, and this
varies from sample to sample.
• Value of df increases, variability decreases, more closely resembles normal curve
• Always find degrees of freedom that is LESS than yours

Hypothesis Test with Z Statistic

And we assume normal population and independent observation

Hypothesis Tests with Two Independent Samples

• Most research uses two samples! (and concerns a mean difference between 2)
• Independent Samples: two sets of data are from completely separate
• Related/Dependent Samples:: two sets come from the same sample

Notation: We use subscripts (1,2) to distinguish between the two samples with n, sum of
squares, population mean, and sample mean.

Hypotheses for Independent Sample T-Tests:

Goal is to evaluate mean difference between the two populations…
• Null Hypothesis: Ho = Mu1 – Mu2 = 0
• There is no change, affect, or difference, between 2 populations
• Alternative Hypothesis H1 = Mu1 - Mu2 IS NOT EQUAL to 0
• There IS a difference, affect, or change between 2 populations

So if we reject Ho means that the data indicates a significant difference b/w 2 pops
If we fail to reject Ho means that data does not provide sufficient evidence to conclude
that a difference exists

T Statistic Formula for 2 Samples… we are looking at mean differences


Single sample  Sx
measures how much error is expected between sample mean and population mean
Independent Sample  Sx1 – Sx2
Measures how much error is expected when you are using a sample mean different to represent a
population mean difference
So for when n1 = n2 we can use this formula for standard error.
s12 s22
x1 − x2 = +
n1 n2

Numbers,, we can’t when n1 doesn’t equal n2, we get a

But because of the Law of Large Numbers
biased statistic because we know that larger samples provide better estimates about a
population… the proper weights ar
are not assigned.

So according to this law, sample variances should not be treated equally if sample sizes
are different because statistics from larger samples are better estimates.

Standard Error when Sample Sizes Different:

Pooled Variance allows the bi

bigger sample to carry more weight

Sample: s2= SS
Sample Variance for One-Sample:

Pooled Variance: sp2 = SS1 + SS2

df1 + df2

Standard error : s X1 –X2 = s p2 + s p2

n1 n2

and now we can go back and calculate the t statistic – the degrees of freedom for the t
statistic is the df1 + df2

Assumptions underlying T-formula

formula for Independent Samples
1) Independent Observations
2) Normal populations
3) Two populations must have homogeneity of variance – or “equal” variances
If there is a large discrepancy (3
(3-4X) between the sample sizes, we worry
Hypothesis Testing Intependent t 4 Step Procedures

Step 1: State Ho, State H1, and select alpha

Step 2: locate critical region that would be unlikely if null hypothesis was true
• Calculate TOTAL degrees of freedom
• Use T distribution table
Step 3: Get data, compute test statistic
• Find pooled variance of two samples
• Use pooled variance to compute standard error
• Compute t statistic
Step 4: Make a decision (look at observed vs critical t)

Hypothesis Testing with Related/Dependent samples – when we get the data from the same
individuals, maybe under two conditions


Repeated Measures Study: study in which a single sample of individuals is measured more than
once on the same dependent variable. – so same subjects used in all of treatment conditions
• No risk that subjects in different conditions are different, so its advantageous for
researchers to choose this design
• We can approximate this style of study by matching subjects…

Matched Subjects Study

• Two separate samples, but each individual matched with a subject in other sample
• Goal is to stimulate repeated measures design as closely as possible and on variables that
are most relevant
• MATCHING depends on variables used for matching process

T Scores
Based on a difference score (D) = X2 – X1
We use this D score instead of a raw X score – so one D score represents one person’s data

Hypothesis for a Related Sample Test

• Goal is to use the sample of different scores to answer questions about a general
• Null Hypothesis
• Ho MuD = 0
• Mean difference for the general population is zero… some might show a
positive difference, some negative, but they average to zero
• Alternative Hypothesis
• H1 MuD NOT EQUAL to 0
• There is a treatment effect that causes the scores in one treatment
condition to be systematically higher or lower than the scores in the other
T formula
• Sample data for related sample design are difference scores and are identified by the
letter D.

First compute variance

Calculate standard error (one D score for each individual)
We refer to number of D scores instead of number of f scores

So 5 people = 5 difference scores = 4 degrees of freedom

Step 1: State hypotheses Ho and H1 and select alpha level

Step 2: locate critical region ( sample data that would be extremely unlikely if null hypothesis
were true)
Step 3: Get the data, compute test statistic (t statistic)
Step 4: make a decision! If t statistic is in critical region, reject null hypothesis, otherwise we say
that the data do not provide sufficient evidence that the two populations are different

Assumptions for Related Samples t Tests

1) Observations within each treatment condition are independent
2) Population distribution of difference scores must be normal

When to use Related Samples

• When researcher would like to study a particular type of subject that is not commonly
• When researcher questions concern changes in responses across time

When to not use Repeated Measures

• When you are comparing two different populations you MUST use separate samples
from each population (men vs women, etc)

Advantages of Related Samples Studies

• Each subject enters research study with his or her own individual characteristics – and
these differences can influence our scores and create problems when interpreting results
• With independent measures design there is always the potential that the individuals in
one sample are substantially different than the individuals in the other sample
• A repeated measure design eliminates this problem because the same individuals are used
in every treatment.

ESTIMATION – still using sample statistics to make inferences about an unknown population

Two types of estimates:

• Point estimates
• Using a single number as estimate for an unknown quantity
• Precise but less confident
• Most sample means pile up around center where z = 0, so we set z = 0 for

• Interval estimates (confidence intervals)

• Using range of values as an estimate of an unknown quantity
• Less precise but more confident
• As interval increases confidence increases, but precision decreases
• Steps
• We select a range of Z values associated with confidence interval
• Commonly used confidence value is 60% and up
• 90% confidence = crit tails are 5% so Z score is +/- 1.65

Confidence Interval: When an interval level is accompanied by a specific level of confidence (or
probability) it is called a confidence interval. So a 90% confidence interval means that we are
90% sure that the scores fall somewhere within the interval
• Narrow interval = more precise
• WIDTH is influenced by sample size and level of confidence
• N gets larger, then width gets smaller  more precise
• % confidence increases, width gets larger  less precise

Hypothesis Tests: answer question  did it have an effect? (YES/NO)

Estimation: answers question “how much of an effect, and in what direction?” (value)


• WHEN: we do estimation with a Z score when the SD is KNOWN but Mu is not

Z or t = sample statistic – unknown population parameter
Standard error

Sample statistic is like X, D bar Unknown population parameter is like Mu

Standard error is like Sx

Unknown population = sample statistic +/- (z or t) * standard error parameter

Mu = Xbar +/- Z * Sx
Our goal is to find population mean so we solve equation for Mu


Unknown population mean = sample mean (X bar) +/- standard error

or mean difference or mean difference


Procedure for Estimation with T Statistic

1) Use sample data to computer the sample mean (or mean difference) and the standard error
2) Estimate a value, or range of values, for t.
* point estimate t = 0
* interval estimates we translate confidence interval (90%) into t scores
3) calculate sample means and standard error from the sample data and estimated value for T
from T table and PLUG IN to estimation equation

Confidence Intervals and Statistical Significance

• If 0 is in the range it is NOT a significant finding  we need to be able to specify

direction of effect
• If an interval contains zero, then a smaller interval will contain it as well?

Know Your Tests

• 1 sample Z
• 1 sample T (for when there is no SD)
• Indep/dep for T
• Spelling test at beginning and end of first grade
• Dependent sample T Test

• Compare SAT scores kids with extracurriculars to average population

• (we know SD and mean)
• 1 sample Z

• New drug for depression – two groups random assignment

• 2 sample independent
• Because we have 2 ind. Groups, 1 placebo and 1 drug, 2 tests

• Learning in children with autism – match with group normals

• Dependent – matched variable age

• Children behavioral problems, random sample divorced children and measure

behavioral problems on standardized scale for which normal pop. Mean is 100
• 1 sample t score  don’t have SD of population, use sample variance

• New hunger drug, three groups of rats…. ANOVA!

Analysis of Variance (ANOVA): A hypothesis testing procedure used to evaluate mean

differences between two or more treatments (or populations)
• We use with 2+ samples and when we can’t calculate a sample mean difference
• Like independent t samples test – uses sample data to draw conclusions about population
means, but t tests are limited to 2 samples, and ANOVA can do two or MORE because
• Can be used with independent measure or repeated variable design

Independent Variable: (control) what is manipulated to make different treatment conditions

Quasi independent variable: a non manipulated variable used to differentiate a group of scores
Both of these are called FACTORS

Single Factor Design: research that involves one factor

Factorial Design: more than one factor

Two Interpretations
1) NO DIFFERENCES (or differences due to sample error) (Null hypothesis)
Mu 1 = Mu 2 = Mu 3
2) REAL DIFFERENCES (alternate hypothesis)
At least one population mean is different from the others

Test Statistic for ANOVA – test statistic is called an F ratio

T = obtained difference between sample means

Difference expected by chance (error)

F = variance (difference) between sample means

Variance (differences) expected by chance (error)

• For both of these, a large value means that the sample mean difference is more than
chance variance  difference
Between treatment variance: difference between treatment conditions
Either caused by treatment effects or due to chance:
Individual differences
Experimental error
Within treatment variance: difference that is likely to occur just by chance alone
• Inside a treatment condition, set of individuals are treated the same, so if they were
treated the same, then why are the scores different? Differences within a treatment are
due to chance

F = variances between treatments

variance within treatments unsystematic (ERROR)

F Ratio: Total variability = Between treatment variability + within treatment variability

F = treatment effect + differences due to chance

Differences due to chance

So when treatment has no effect, F = 1

When treatment has an effect F ratio noticeably greater than 1

Important Notation for ANOVA

K = number of levels of a factor (number of treatment conditions) (50, 70, 90 degrees)

N = number of scores in EACH treatment (count values in each column)
N = total number of scores in the ENTIRE study
When they are equal
Sample size N = k * n
T = the total for EACH treatment (so sum of all of the Xs)
G (Grand Total) = sum of ALL scores in research study (so add up all the T’s, or all the numbers)

2 separate Analyses

• Compute SS TOTAL (between and within)

• Compute df TOTAL

F = MSbetween

And SStotal = SSbetween + SSwithin

Remember that we need SS and df to find variance s(squared)


Df total = N -1

Df within = N –k

Df between = k -1

Check work with this table:

Post Hoc Tests tells us where differences are, after we have rejected null for ANOVA and
decided that not all means are the same

Pairwise comparisons: looking at two treatments at a time for a Post Hoc test
Familywise Error Rate: as we do more separate tests, the risk of making at least one Type I
errors in the family goes UP!

Alpha .26 = 26% chance of Type I error

Tukey’s Honestly Significant Difference Test (HSD):

• Allows us to compute a single value that determines the minimum difference between
treatment means that is necessary for significance – we compute it and then use it to
compare any two treatment conditions
• If mean difference exceeds HSD we can conclude that there is significant
difference, if not, we conclude treatments aren’t different from one anoher

Scheffe Test – uses an F ratio to test for a significant difference between two treatment

1) start with largest mean difference, list n’s, means, and T’s, and find G and N
2) Compute a new MSbetween sums(Tsquared/n – Gsquared/N)
3) Use Old dfbetween k-1
4) MS between = New SSbetween / df between

Relationship between ANOVA and T Test

 When you have data from an independent measures experiment with only two
treatmentconditions, you can use either a t test or an independent-measures ANOVA
 Makes no difference because they will always results in the same statistical decision
 Basic relationship can be stated

F = t2

Remember F-ratio is based on squared differences while t statistic is based on differences

Assumptions for Independent Variable ANOVAs

1) The observations within each sample must be independent

2) The populations from which the samples are selected must be normal
3) The populations from which the samples are selected must have equal variances
(homogeneity of variance)

• Statistical package for the Social Sciences

• 1968 – needed to quickly analyze volumes of data
• Can use with Windows point and click approach or syntax

Two views
• Variable view
• Data view

Some menus:
• Data
• Transform
• Analyze
• Graph

Questions we might ask

• Is there a difference in age between short, medium, and tall individuals?
• Is there a difference between before and after scores?
• Is height of males in sample different than national average?

Repeated Measures (RM) ANOVA

(ANOVA for three people under the same conditions)

• F ratio in the ANOVA compares all different sample means in a SINGLE test using a
SINGLE alpha level
• Using a t statistic would require multiple t tests to evaluate all the mean
differences which inflates the Type 1 error rates

• Single factor research study – involves one independent variable

• Independent measures study – study uses a separate sample for each of treatments
• Age and time are the most commonly used factors

Hypotheses for RM ANOVA

Null: mu1=mu2=mu3
• All treatments have exactly the same effect
• General population no mean difference
• Any difference between sample means if from chance alone

Alternative: At least one treatment mean is different from the others

• Treatment conditions are responsible for causing mean differences among samples

F-ratio for RM ANOVA

F = treatment effect = chance/error (excluding indiv diff)

Chance/error (excluding individual difference)

Actual mean difference between treatment

Amount of difference would be expected just by chance
F = variance (difference) between treatments
Variance (difference) expected by chance/error

F = Variance/Difference between treatments (without individual differences)

Variance/differences expected by chance (with individual differences removed)

Differences in F Ratios
• RM Design eliminates variance caused by individual differences (stuff that is different
between people, even with matching) in the numerator and denominator – since the same
subjects are used in every treatment condition
• Since SAME individuals used in every treatment condition, we can measure the SIZE of
individual differences – these differences are consistent
• Because there is a consistent difference between subjects across all treatment
conditions, we can be reasonably confident that the 10 point difference is simply
not chance or random error, but rather is a SYSTEMATIC and PREDICTABLE
measure of the individual differences between subjects.

Logic of RM ANOVA

I. Variance between Treatment (NUMERATOR F Ratio)

1) Treatment effect: **what the researcher wants to see

Different treatment conditions cause the individuals scores in one condition to be
higher or lower than in another condition

2) error on chance
Even if there is no treatment effect, still possibility of differences due to chance
Same individuals measured two different times; still chance of unsystematic and
unpredictable chance or error

II. Variance due to chance or error (DENOMINATOR f Ratio)

• Measure the variance due to random sources of error without including individual
• Calculate within treatment variance (as we did with independent measures) but then
subtract out the individual differences
• Results in a measure of unsystematic error

Notation and Formulas

1st stage: (identical to independent measures) NUMERATOR

• Total Variability
• Variance between treatments
• Variance within treatments
• Use same notation and formulas as before

2nd stage: (DENOMINATOR)

• Goal is to remove the individual differences from the denominator of the F ratio
• Begin with the variance within treatments and then measure and subtract out the
individual differences
• The remaining variance is often called residual or error variance
• Measures how much variance is reasonable to expect by chance after the
individual differences have been removed.

K = number of treatment options

n = number of scores in each treatment condition
N = total scores in study
Sum of ALL scores = G = sum of T’s
Sum of scores in each treatment = T
SS = sum of squares for each treatments
Sum X2 = sum of the squared scores for the entire study
P = total number of scores for each individual in study (“persons total”) P values reflect
individual differences, and we use P values to calculate SSbetween subjects

F = MSbetween

S2 = SS

Source table for RM ANOVA

Source SS df MS F

Between Treatments
Within Treatments
Between subjects
• When k is greater than 2 we must use post hoc tests to determine where differences lie.
• We can use Sheffe and substitute MS error in place of MSwithin

Advantages and Disadvantages of RM ANOVA

• Desirable if supply of subjects is limited because we are using fewer subjects

• Eliminates role of variability due to individual differences
• More sensitive to detect actual treatment effects where there are large individual
• May produce carry over effects (changes in behavior because of inbetween error or
progressive error (like subjects getting tired because they have to take IQ tests over and
over again) and this is a bad reason to use the same people!

Assumptions of RM ANOVA

• Independent observation
• Normal distribution (only important with small samples)
• Variances of the population distributions for each treatment should be equivalent

Introduction to Factorial Design

Factorial Design: research study with more than one factor

• Will limit to 2 factors
• Independent measures and all n’s equal
• 2 factor ANOVA
• THREE separate hypothesis into one analysis
• Each of these tests will be based on its own F ratio computed from the data
• Two independent variables called Factor A and factor B

Example: testing heat vs humidity (independent variables) and seeing how different combinations
affect some sort of performance (dependent) So we match each humidity with each temperature
in a chart.
• 2 factor ANOVA will test for
• Mean difference between two humidity levels
• Mean differences between the three temperature levels
• \any other mean differences that may result from unique combinations of specific
temperature and specific humidity levels
• (high humidity may be especially disruptive when the temperature is also
MAIN EFFECTS: mean differences produced by factors independently, mean difference among
levels of one factor
• Do difference in factor A (humidity) lead to differences in performance?
• Evaluate mean difference between the ROWS, between 30% and 70% humidity – this
difference is called the main effect for Factor A.
• For factor B, look at means of COLUMNS, differences are main effect Factor B,
• We must evaluate main effects with hypotheses to test for significance
• Evaluation of main effects make up 2/3 hypotheses tests in 2 factor ANOVA
• 2 F ratios to be evaluated independently

Main Effect Hypotheses

Null: There is no mean difference between the two levels: Ho: MuA1 = MuA2
Alternative: the two different levels do produce different scores
H1: MuA1 is NOT = to MuA2

Main Effect F Ratio (A)

F = variance (difference) between mean for Factor A

Variance (differences) expected by chance/error

F = variance (differences) between row means

Variance (differences expected by chance/error

Main Effect F Ratio (B)

F = variance (differences) between column means

Variance (differences) expected by chance/error

Interaction Between Factor: any “extra” mean differences that are not explained by the main

INTERACTION: mean difference produced by factors acting together. Occurs whenever the
mean differences between individual treatment conditions, or cells, are different from what would
be predicted from the overall main effects for the factors.

Interaction Hypotheses:

Null: there is no interaction between factors A and B. All the mean differences between
treatment conditions are explained by the main effects of the 2 factors

Alternative: there is an interaction between factors. The mean differences are not what would be
predicted from the overall main effects for the 2 factors

F = Variance (mean difference) not explained by main effects

Variance (differences) expected by chance/error
How do we know when there will be an interaction?
• When two factors are interdependent, and influence one another
• No interaction: Interdependency
• Does the size of Factor A (top row vs bottom row) depend on factor B?
Is the change in humidity of X points the same for all levels of
temperature? If not, there is NO INTERACTION
• Interaction: interdependency
• If changing factor A does influence factor B
• Find difference between rows of each column, and see if it is consistent
• If they are independent, there is no interaction

2 factor ANOVA has three separate, independent tests. We can find…

• Significant Main effect for A but not significant main effect for B and no interaction
• Significant Main Effect for both A and B but no interaction
• Significant main Effect for both A and B and a Significant Interaction
Doing the analysis…
• We need variance values for THREE F Ratios
• 3 between treatment variances
• 1 within treatment variance
• Mean square = MS = SS/df


Statistical technique that is used to measure and describe a relationship between two variable.
• Observed existing in natural environment, not controlled or manipulated,
• Need Xs and Ys, two from each individual = 1 point
Characteristics of a Relationship:
1) Direction of the relationship
2) The Form of the Relationship
3) The Degree of the Relationship

Direction of the Relationship

POSITIVE: X moves in the same direction as Y, X increases, so does Y. X decreases, so does Y

• Beer sold and temperature
NEGATIVE: 2 variables move in opposite directions: As X increases, Y decreases. INVERSE
• Coffee sold and temperature

Form of the Relationship

• Linear/straight line is most common

Degree of the Relationship

• Correlation measures how well the data fit a straight line
• +/- 1 means a perfect fit
• 0 is no fit at all
• Intermediate values represent the degree to which the data points approximate the perfect

Where and Why are Correlations Used?

• Prediction: if two variables are known to be related in some systematic way, it is possible
to make accurate predictions about the other (SAT scores and college GPA)

• Validity: correlation is commonly used to demonstrate the validity of a test

• If you are measuring what you say you are measuring then scores on your
measure should be related to other measures of the same construct

• Reliability: A reliable measurement procedure will produce the same (or nearly the same)
scores when the same individuals are measured under the same conditions. One way to
evaluate reliability is to use correlations to determine the relationship between two sets of

The Pearson Correlation

• Most commonly used calculation of correlation
• Measures the degree and direction of linear relationship between 2 variables
• Defined by the letter r
• Conceptually computed by
• R = degree to which X and Y vary together
• Degree to which X and Y vary separately

Calculation Pearson Correlation

• Sum of products of deviations (SP)
• Measures the amount of covariability between two variables (degree to which X and Y
vary together)
Interpreting Pearson Correlation: 4 Considerations

• Correlation describes a relationship, nothing more – CORRELATION DOESN’T IMPLY

• Never generalize correlation beyond range of scores provided by data – value of
correlation affected by range of scores
• OUTLIERS are extreme values that can have huge effect on correlation
• Correlation is NOT EQUAL to proportion r squared = accuracy, so .5 squared = 25% acc.

Other correlation measures

• Pearson correlation is most commonly used with data from an interval or a ratio scale of
• Other correlation measures have been developed for nonlinear relationships and for other
types of data

Spearman Correlation
• Used in two situations
• Measure a relationship between variables measures on ordinal scale of
• Can be used with ratio and interval scales even when the relationship is not linear

Point Biserial Correlation

• Used to measure the relationship between two variables in situations where one variable
is measured on an interval or ratio scale but the second variable has only two different
values (called dichotomous)

• Used when both variables (X and Y) measured for each individual are dichotomous

Regression: the statistical technique for finding the best fitting straight line for a set of data is
called regression, and the resulting straight line is called the regression line.

Line serves two purposes

• Show center of the relationship
• Can be used for prediction (X related to Y)
• Regression is procedure that identifies and defines the straight line that provides the best
fit for any specific set of data – Line of Best Fit – defined by equation

Linear Equations:
• Y = bX + a
• b is the slope
• a is the y intercept

Least Squares Solution: distance between this line and each data point. This is called predicted
Y, and is called Y hat.
• Distance = Y – Yhat
• Best fitting line has smallest total squared error
• Least squared error solution

Regression Line for Y: Yhat = bX + a

Cautions about interpreting predicted values:

• The predicted value is not perfect. There will be some error between the predicted Y
values (on the line) and the actual data.
• Although the amount of error will vary from point to point, on the average the
errors will be directly related to the magnitude of the correlation
• A correlation near +/- 1 the data points will generally be close to the line (small
• But as the correlation gets nearer to zero the magnitude of error will increase

Cautions about interpreting predicted values

• The regression line should not be used to make redictions for x values that fall outside the
range of values covered by the original data. This is because you have no information
about the X-Y relationship outside that range

Standard Error Estimates: a measurement of the standard distance between a regression line
and the actual data points. This is sum of squares, or SSerror

It is possible to find the best fitting regression equation for any set of data by simply using the
formulas already presented

• Accuracy of this prediction depends on how well the points on the line correspond to
actual data points
• So while a regression equation by itself allows you to make predictions, it does not
provide any information about the accuracy of the prediction
• To measure the precision of the regression a standard error of estimate must be computed

Variances = SS/df
df = n-2

Standard error of estimate = SQROUTE(SS error/df)

SSerror = sum(Y- Yhat)squared

Standard Error vs Correlation

• Standard error is directly related to the magnitude of the correlation between X and Y
• If the correlation is near +/- 1m the data points will be clustered close to the line and the
standard error of estimate will be small
• As the correlation gets near zero, the line will provide less accurate predictions, and the
standard error will grow larger

Standard Error and Correlation

• Earlier we learned that squaring the correlation provides a measure of the accuracy of the
• R squared is called the coefficient of determination because it determines what proportion
of the variability in Y is predicted by the relationship with X
• Thus 1 – rsquared to measure the error portion

SSerror = (1-rsquared)SSy

Correlation and Regression

• Because it is possible to have the same regression line for sets of data that have different
correlations, it is important to examine r squared and the standard error of estimate
• The regression equation simply describes the best fitting line and is used for making
• However, r squared and the standard error of estimate indicate how accurate the
predictions will be

Interpretation of the r Value

• Coefficient of correlation (r) the r value which indicates the strength of the linear
relationship between the two variables
• Coefficient of Determination (r squared) squaring the r value provides an indication of
the proportion if the variance in one variable that is accounted for by the variance in the
other variable
• Coefficient of Non Determination (1-r squared): subtracting the squared r value from 1
provides an indication of the proportion of the variance in one variable that is not
accounted for by the variation in the second variable.

Multiple regressions have more than one predictor

Chi Square Test: when we have questions about relative frequencies for a distribution
• Uses sample data to test hypotheses about the shape or proportions of a population
• Determines how well the obtained sample proportions fit the population proportions
specified by the null hypothesis.
• Null: no preference vs no difference
• Alternative: there is a preference, or there is a difference

Expected frequency: frequency value that is predicted from the null hypothesis and the sample
size – calculated. The observed frequency are what you see, always whole numbers

Chi Square Distribution positively skewed because formula involves adding squared values
Parametric Tests (ANOVA, T-Test)
• Test hypotheses about specific population parameters
• Requires numerical score for each individual in sample (from interval or ratio scale)
• More sensitive test
Non parametric tests
• Hypotheses not stated in terms of a specific parameter
• Use data from nominal or ordinal scales  can’t calculate means and variances, it’s all
about frequency
• Less sensitive test

Assumptions and Restrictions for Chi Square Test

• To use chi square test for goodness of fit or a test of independence, these conditions must
be satisfied
• Independence of observation: each frequency = diff person
• Size of expected frequencies – should not be performed when expected frequency
is less than 5


Statistical Power: the POWER of a statistical test is the probability that the test will correctly
reject a false null hypothesis – so the probability of reaching a correct decision

• Purpose of hypothesis test is to determine whether or not a particular treatment has an

effet – and there is always risk of making wrong conclusion, like Type I error. We
minimize this risk by selecting an alpha level that determines the maximum probability of
committing a type I error


MORE POWERFUL means that will more readily detect a treatment effect when one exists
Higher power leads to correct detection

Power = 1 – Beta

What Determines the Power of a Study?

• Main Factors:
• How big an effect the research hypothesis predicts
• How many participants are in the study (sample size)
• Significance level chosen

Usually we attempt to design study with 80% power


• Size of treatment effect

• Power d epends on size of treatment effect
• When treatment has large effect, will be easy to detect this effect and power will
be high
• When treatment effect is small, it will be difficult to detect, and power will be
• Significance Level
• Less extreme significance/alpha levels (.1/.2) will result in more power (because
shaded area/critical region is bigger and its easier to reject null
• More extreme significance levels = less power

• Sample Size
• If there is a treatment effect in the population, you are more likely to find it with
a large sample than with small samples

• Increasing Power:
• Increase treatment effect
• Reducec “noise” or error
• Hard for researcher to control this
• Increase (weaken) significance level
• Eg: change alpha to .2 instead of .05
• Although this lowers the Type II error the risk of Type I increases
• Increase sample size
• Most common method that researchers use to increase the power that
their study will detect a true treatment effect

Sample Size Estimation

• The goal is to make sample size “large enough” but not wastefully large
• In order to estimate sample size you need alpha, desired power, and estimated effect size

Final Exam Studying

RM Anova – give degrees of freedom (4,15) figure out total subjects in study

2 Factor Anova – 3 hypothesis tests – what F ratios are calculated for each test
Graph different scenarios for different F ratios
(main effect for A but not B but interaction, etc)

Correlation – what is R?

Know three characteristics about what goes into correlation (r)

Strength (degree)
Form (linear) since is pearson

R2 is accuracy, how much variance y explained by variance x

Regression, goal is to find line of best fit

Difference between little r, r squared, and (1-r)squared

Y hat (predicted Y values) and Y(actual), how different

When would I use multiple regression?

Chi squared – difference between goodness of fit and independence tests, know df for each one,
shape of chi square distribution – its positively skeweed hump, most around zero. Why?
Because formula is f0-fe SQUARED so there will never be negative numbers

Different factors that impact power, and how it relates to type two error (beta)

Computational (know how to solve everything)

Bonus: source table 2 measures anova

Regression, standard error, test for significance slope
Goodness of fit, independence chi square test
Also correct critical values Df ERROR not df WITHIN