Professional Documents
Culture Documents
Statistics: A set of methods and rules for organizing, summarizing, and interpreting
information. (also creates a standard for comparison)
Facts and figures that condense large quantities of information into a few simple figures
or statements (because people can understand averages!)
• Methods
• Interpretation
Descriptive Statistics: a set of methods and rules for organizing and summarizing
(mean, median, mode, graphs)
Sampling Error: sample data isn’t always perfect. This discrepancy is called sampling
error – the amount of error that exists between a sample statistic and the corresponding
population parameter.
Objectivity: studies should be conducted in a way to prevent biases from influencing the
outcome of research (Rosenthal 1963 with rats).
Relationships Between Variables: The GOAL is to find relationships between
variables and put order to the universe.
The GOAL is to find relationships between variables and put order to the universe.
Variable: a characteristic or condition that changes or has different values for different
people (x,y)
Constant: A characteristic or condition that does not vary, but is the same for every
individual (adding 4 pts to everyone’s exam)
Measuring Data
Four Levels:
Nominal: data that is names, labels, categories, cannot be arranged in scheme (low
high) ex: gender, SS#, favorite sports team, etc
Ordinal set of categories organized in an ordered sequence – differences in data values
cannot be determined or are meaningless (RANK ORDER) (course grades, rankings,
stove settings)
Interval: consists of ordered categories where all of the categories are intervals of the
exact same size, BUT there is no natural ZERO starting point. (temperature, dates, etc)
Ratio: same as the interval scale modified to include the natural zero starting point
(weight and prices)
Continuous: infinite number of possible values that fall between any 2 observed values –
is divisible into an infinite number of fractional parts (running water, time)
HISTOGRAM
• Use when frequency distribution has data from an interval or ratio scale (draw bar
above X value so height = frequency of score)
• Adjacent bars touch eachother, continuous figure
BAR GRAPH
• Used when presenting frequency distribution data from ordinal or nominal scales
• Space between bars to emphasize distinct, discrete categories
Describing a Distribution
• Shape
o Symmetrical
Kurtosis = peakedness, HUGE peak and tails are extreme
o Skewed
scores pile up on one end and taper off of other
positively skewed: tail to left
negatively skewed: tail to the right
• Central Tendency – concept of average. the easiest way to describe scores –
compute an AVERAGE!
o Mean, median, mode
o Goal is to obtain a single value that IDs a single score as representative of
the entire distribution
• Variability
Measures of Variability
Range (Max – min) insensitive because it’s determined by the extreme values
Standard deviation/variance – approximates average distance from mean
• Standard deviation is a descriptive measure that describes how variable, how
spread out, the scores are in a distribution
1) Find the mean of the data (add up all the values, divide by number)
2) Determine the deviation for EACH SCORE from the mean (x-Mu)
3) Square these values
4) Then find the average of these squared values – The numerator is SS, the sum of
squares, and then we divide by N. this is VARIANCE
Variance = (X - Mu)2 / N
1) Definitional formula:
a. Subtract the mean from each value, square it, then add them all up = definitional
SS
2) Computational formula:
a. Square each value, add up the squares, then SUBTRACT [addition of all the
values, squared, then divided by number of values] = computational SS
The problem with populations vs samples is that samples tend to be less variable than populations
– so we have to adjust for this bias (which can either be an overestimation or an underestimation)
Z scores (standard scores) describe an individual score, using a mean and standard deviation,
changes X into Z score, to standardize scores
Purposes of Z Scores
If every x value in a distribution was transformed into Z scores, would have following
properties:
• Standardized distribution: Every Z score distribution has the same mean (ZERO) and
the same SD (ONE)
• When two scores are from different distributions, it is impossible to make any direct
comparison between them
o To calculate we need to raw score, mean and standard deviation of each
distribution you are comparing
o We can compare Z scores because the z scores are coming from an equivalent
standardized distribution
Two steps involved in standardizing a distribution so that it has a predetermined mean and
SD
• Each of raw scores is transformed into Z score
• Each of the Z scores is transformed into a new X value so that a particular mean and SD
are achieved
Probability
P(specific outcome)
Event never occurs then p = 0
Certain to occur p = 1
• Random sampling:
o Each person in population has equal chance being selected (1/N)
o If more than one individuals selected, must be a constant probability for
each and every selection
o Aka, don’t put the card back
• Every score from a population has a Z score – says WHERE that score is located
in the distribution
o We want to be able to transform sample mean into Z scores too, so n can
be more than just one value
o NOW a Z score represents an entire sample
Beyond + or – 2 means an extreme sample
Distribution of samples means: basis for ability to predict sampled characteristics – the
collection of sample means for all the possible random samples of a particular size (n)
that can be obtained from a population
• It isn’t possible to list all samples and calculate every possible mean
• So we have general characteristics of the distribution of sample means, called
central limit theorum
• Describes the distribution of sample means for any population, no matter what
shape, mean, or SD.
• Distribution of normal means “approaches” a normal distribution very rapidly
o by the time n= 30, the distribution is almost perfectly normal
**and remember that any distribution can be defined by shape, varability, and central
tendency
SHAPE
• distribution of sample means tends to be normal distribution if one of following is
satisfied:
1) population is a normal distribution
2) number of scores is large or greater than 30 (regardless of shape)
Standard Error:
• Magnitude determined by:
1) size of the sample – Law of Large numbers smaller error margin with larger
sample
2) Standard deviation is starting point for standard error, so when n =1, Standard
deviation = standard error. As sample size increases, standard error increases
with relation to n.
• Standard deviation
o Use when working with a distribution of scores and it measures the standard
distance between a score and the population mean
• Standard Error
• Use when you have a question concerning a sample, and it measures the standard
distance between the sample mean and the population mean
Hypothesis Testing: a statistical procedure (the most commonly used) that lets us make
inferences about a population using sample data. (we use Z scores, probability, and distribution of
sample means)
Hypothesis Test: a statistical method that uses sample data to evaluate a hypothesis about a
population parameter
• Testing Procedure
• 1) State a hypothesis about a population (usually population parameter)
• 2) Obtain a random sample from the population
• 3) Compare sample data with hypothesis
• Testing Procedure
1) State a hypothesis
• Null Hypothesis (Ho) = treatment has no effect! There is no change
• Independent variable (treatment) has no effect on dependent
variable for population
• Scientific or Alternate Hypothesis (H1) = treatment does have an effect
• There is SOME TYPE OF CHANGE! The independent variable
(treatment) will have an affect on the dependent variable.
• Non-directional Test: this hypothesis does not specify direction
of change
•
2) Set the criteria for a decision
• Alpha Level / Level of Significance: probability value used to define
the very unlikely sample outcomes if the null hypothesis is true.
• Determines boundaries of critical region
• Defines “unlikely” outcomes
• Must select level to minimize Type I error
• Largest permissible value is .05 (5%)
• Tradeoff between lowering alpha level and difficulty of finding a
treatment effect (usually use .05, .01, and .001 to maintain
balance)
• Critical Region (tails) extreme values that are very unlikely to be
obtained if the null hypothesis is true (so if we see values in the critical
region, it is likely that null hypothesis is NOT true – the treatment DID
have an affect). Boundaries are determined by alpha level.
• Sample data fall in critical region null hypothesis rejected
We are computing a z score that describes exactly where the sample mean is located relative to
the hypothesized population mean from Ho
4) Make a decision
So in terms of Z scores:
• A Z score near 0 means we aren’t in critical region – hypothesis was not correct
• A Z score extremely positive or negative also means that hypothesis was wrong
Z Score Formula
• Z = obtained difference
Difference due to chance
Is the result of the research study (obtained difference) more than would be expected by chance
alone?
Most hypothesis tests require that the obtained difference be 2 to 3 times bigger than chance
before the research will reject the null hypothesis (and say that the treatment did cause a
difference)
Uncertainty and Errors in Hypothesis Testing
Inferential Process:
• Always possibility that an incorrect conclusion was made
• Type 1 Error; when researcher rejects null hypothesis, says that treatment was
effective, and in fact it is not
• Unlikely to obtain sample mean in critical region when null hypothesis is
true. Critical region is determined by alpha level, so alpha level for
hypothesis test is equal to probability that test will lead to Type 1 error.
Bigger alpha level = bigger critical region = more likely have sample means in that area = more
likely to reject null hypothesis and say treatment was effective. So a smaller alpha level is more
fine tuned, if we reject the null hypothesis with a smaller alpha level and say treatment was
effective, this is more reliable.
• Type 2 Error: when a researcher fails to reject a null hypothesis that is really
false. So researcher fails to reject a null hypothesis, says that treatment isn’t
effective, and it is? Not as serious, and just means that research data doesn’t
show results that researched had hoped to obtain.
• Is a function represented by Beta symbol
Significance: means that the result is different from what would be expected due to chance
Findings are significant when the null hypothesis is rejected
In papers: “treatment with medication had a significant effect on people’s depression scores, z =
3.85, p<.05
• Significant means that we rejected the null hypothesis, as a sample mean fell in the
critical region
• Z = 3.85 is the Z score for the sample mean
• P < .05 is the alpha used for the test, meaning that there must be a less than 5%
probability that the finding is due to chance alone
• So researching is 95% confident that the obtained difference is greater than what
one would expect by chance alone
• Shortfall of Z score test is that it usually involves more information than is available
• Population standard deviation or variance
• We can’t calculate standard error without this!
When do we use the T statistic? When the population SD is not known. When the variability
for the population is not known, we use the sample variability in its place
So Estimated Standard error : used to estimate the real standard error in situations when the SD
is unknown. Gives estimate of the standard distance between a sample mean of X bar and the
population mean (Mu)
Only difference between t and z formula is that z score formula uses the actual population
variance, and t formula uses the sample variance (because we don’t know the population one)
How well does sample variance approximate the population variance? Lets us know how
well a t statistic approximates a z score.
• Df = degrees of freedom = n -1
• Sample mean places a restriction on the value of one score
• GREATER value of df is for a sample, better sample variance represents population
variance better t statistic approximates the Z score
T Distributions
SHAPE
• Shape changes with degrees of freedom – there is a different sampling distribution of t
for each possible number of degrees of freedom
• As df gets very large, the t distribution gets closer in the shape to a normal z-score
distribution
Standard error isn’t constant because it is an ESTIMATE based on standard variance, and this
varies from sample to sample.
• Value of df increases, variability decreases, more closely resembles normal curve
• Always find degrees of freedom that is LESS than yours
• Most research uses two samples! (and concerns a mean difference between 2)
• Independent Samples: two sets of data are from completely separate
samples
• Related/Dependent Samples:: two sets come from the same sample
Notation: We use subscripts (1,2) to distinguish between the two samples with n, sum of
squares, population mean, and sample mean.
So if we reject Ho means that the data indicates a significant difference b/w 2 pops
If we fail to reject Ho means that data does not provide sufficient evidence to conclude
that a difference exists
Single sample Sx
measures how much error is expected between sample mean and population mean
Independent Sample Sx1 – Sx2
Measures how much error is expected when you are using a sample mean different to represent a
population mean difference
So for when n1 = n2 we can use this formula for standard error.
s12 s22
x1 − x2 = +
n1 n2
So according to this law, sample variances should not be treated equally if sample sizes
are different because statistics from larger samples are better estimates.
Sample: s2= SS
Sample Variance for One-Sample:
df
and now we can go back and calculate the t statistic – the degrees of freedom for the t
statistic is the df1 + df2
Hypothesis Testing with Related/Dependent samples – when we get the data from the same
individuals, maybe under two conditions
Repeated Measures Study: study in which a single sample of individuals is measured more than
once on the same dependent variable. – so same subjects used in all of treatment conditions
• No risk that subjects in different conditions are different, so its advantageous for
researchers to choose this design
• We can approximate this style of study by matching subjects…
• MATCHING PERFECT
T Scores
Based on a difference score (D) = X2 – X1
We use this D score instead of a raw X score – so one D score represents one person’s data
ESTIMATION – still using sample statistics to make inferences about an unknown population
• Point estimates
• Using a single number as estimate for an unknown quantity
• Precise but less confident
• Most sample means pile up around center where z = 0, so we set z = 0 for
Confidence Interval: When an interval level is accompanied by a specific level of confidence (or
probability) it is called a confidence interval. So a 90% confidence interval means that we are
90% sure that the scores fall somewhere within the interval
• Narrow interval = more precise
• WIDTH is influenced by sample size and level of confidence
• N gets larger, then width gets smaller more precise
• % confidence increases, width gets larger less precise
WITH Z SCORE
Mu = Xbar +/- Z * Sx
Our goal is to find population mean so we solve equation for Mu
WITH T SCORE/STATISTIC
1) Use sample data to computer the sample mean (or mean difference) and the standard error
2) Estimate a value, or range of values, for t.
* point estimate t = 0
* interval estimates we translate confidence interval (90%) into t scores
3) calculate sample means and standard error from the sample data and estimated value for T
from T table and PLUG IN to estimation equation
• 1 sample Z
• 1 sample T (for when there is no SD)
• Indep/dep for T
• Spelling test at beginning and end of first grade
• Dependent sample T Test
Two Interpretations
1) NO DIFFERENCES (or differences due to sample error) (Null hypothesis)
Mu 1 = Mu 2 = Mu 3
2) REAL DIFFERENCES (alternate hypothesis)
At least one population mean is different from the others
• For both of these, a large value means that the sample mean difference is more than
chance variance difference
Between treatment variance: difference between treatment conditions
Either caused by treatment effects or due to chance:
Individual differences
Experimental error
Within treatment variance: difference that is likely to occur just by chance alone
• Inside a treatment condition, set of individuals are treated the same, so if they were
treated the same, then why are the scores different? Differences within a treatment are
due to chance
2 separate Analyses
F = MSbetween
MSwithin
Df total = N -1
Df within = N –k
Df between = k -1
Pairwise comparisons: looking at two treatments at a time for a Post Hoc test
Familywise Error Rate: as we do more separate tests, the risk of making at least one Type I
errors in the family goes UP!
Scheffe Test – uses an F ratio to test for a significant difference between two treatment
conditions
1) start with largest mean difference, list n’s, means, and T’s, and find G and N
2) Compute a new MSbetween sums(Tsquared/n – Gsquared/N)
3) Use Old dfbetween k-1
4) MS between = New SSbetween / df between
When you have data from an independent measures experiment with only two
treatmentconditions, you can use either a t test or an independent-measures ANOVA
Makes no difference because they will always results in the same statistical decision
Basic relationship can be stated
F = t2
Two views
• Variable view
• Data view
Some menus:
• Data
• Transform
• Analyze
• Graph
• F ratio in the ANOVA compares all different sample means in a SINGLE test using a
SINGLE alpha level
• Using a t statistic would require multiple t tests to evaluate all the mean
differences which inflates the Type 1 error rates
Differences in F Ratios
• RM Design eliminates variance caused by individual differences (stuff that is different
between people, even with matching) in the numerator and denominator – since the same
subjects are used in every treatment condition
• Since SAME individuals used in every treatment condition, we can measure the SIZE of
individual differences – these differences are consistent
• Because there is a consistent difference between subjects across all treatment
conditions, we can be reasonably confident that the 10 point difference is simply
not chance or random error, but rather is a SYSTEMATIC and PREDICTABLE
measure of the individual differences between subjects.
Logic of RM ANOVA
2) error on chance
Even if there is no treatment effect, still possibility of differences due to chance
Same individuals measured two different times; still chance of unsystematic and
unpredictable chance or error
F = MSbetween
MSerror
S2 = SS
Df
Source SS df MS F
Between Treatments
Within Treatments
Between subjects
Error
Total
• When k is greater than 2 we must use post hoc tests to determine where differences lie.
• We can use Sheffe and substitute MS error in place of MSwithin
Assumptions of RM ANOVA
• Independent observation
• Normal distribution (only important with small samples)
• Variances of the population distributions for each treatment should be equivalent
Example: testing heat vs humidity (independent variables) and seeing how different combinations
affect some sort of performance (dependent) So we match each humidity with each temperature
in a chart.
• 2 factor ANOVA will test for
• Mean difference between two humidity levels
• Mean differences between the three temperature levels
• \any other mean differences that may result from unique combinations of specific
temperature and specific humidity levels
• (high humidity may be especially disruptive when the temperature is also
high)
MAIN EFFECTS: mean differences produced by factors independently, mean difference among
levels of one factor
• Do difference in factor A (humidity) lead to differences in performance?
• Evaluate mean difference between the ROWS, between 30% and 70% humidity – this
difference is called the main effect for Factor A.
• For factor B, look at means of COLUMNS, differences are main effect Factor B,
Temperature
• We must evaluate main effects with hypotheses to test for significance
• Evaluation of main effects make up 2/3 hypotheses tests in 2 factor ANOVA
• 2 F ratios to be evaluated independently
Null: There is no mean difference between the two levels: Ho: MuA1 = MuA2
Alternative: the two different levels do produce different scores
H1: MuA1 is NOT = to MuA2
Interaction Between Factor: any “extra” mean differences that are not explained by the main
effects
INTERACTION: mean difference produced by factors acting together. Occurs whenever the
mean differences between individual treatment conditions, or cells, are different from what would
be predicted from the overall main effects for the factors.
Interaction Hypotheses:
Null: there is no interaction between factors A and B. All the mean differences between
treatment conditions are explained by the main effects of the 2 factors
Alternative: there is an interaction between factors. The mean differences are not what would be
predicted from the overall main effects for the 2 factors
•
Correlation
Statistical technique that is used to measure and describe a relationship between two variable.
• Observed existing in natural environment, not controlled or manipulated,
• CORRELATION DOES NOT IMPLY CAUSATION!
• Need Xs and Ys, two from each individual = 1 point
Characteristics of a Relationship:
1) Direction of the relationship
2) The Form of the Relationship
3) The Degree of the Relationship
• Prediction: if two variables are known to be related in some systematic way, it is possible
to make accurate predictions about the other (SAT scores and college GPA)
• Reliability: A reliable measurement procedure will produce the same (or nearly the same)
scores when the same individuals are measured under the same conditions. One way to
evaluate reliability is to use correlations to determine the relationship between two sets of
measurements
• Pearson correlation is most commonly used with data from an interval or a ratio scale of
measurement
• Other correlation measures have been developed for nonlinear relationships and for other
types of data
Spearman Correlation
• Used in two situations
• Measure a relationship between variables measures on ordinal scale of
measurement
• Can be used with ratio and interval scales even when the relationship is not linear
Phi-Coefficient
• Used when both variables (X and Y) measured for each individual are dichotomous
Regression: the statistical technique for finding the best fitting straight line for a set of data is
called regression, and the resulting straight line is called the regression line.
Linear Equations:
• Y = bX + a
• b is the slope
• a is the y intercept
Least Squares Solution: distance between this line and each data point. This is called predicted
Y, and is called Y hat.
• Distance = Y – Yhat
• Best fitting line has smallest total squared error
• Least squared error solution
Standard Error Estimates: a measurement of the standard distance between a regression line
and the actual data points. This is sum of squares, or SSerror
It is possible to find the best fitting regression equation for any set of data by simply using the
formulas already presented
• Accuracy of this prediction depends on how well the points on the line correspond to
actual data points
• So while a regression equation by itself allows you to make predictions, it does not
provide any information about the accuracy of the prediction
• To measure the precision of the regression a standard error of estimate must be computed
Variances = SS/df
df = n-2
• Standard error is directly related to the magnitude of the correlation between X and Y
• If the correlation is near +/- 1m the data points will be clustered close to the line and the
standard error of estimate will be small
• As the correlation gets near zero, the line will provide less accurate predictions, and the
standard error will grow larger
SSerror = (1-rsquared)SSy
• Because it is possible to have the same regression line for sets of data that have different
correlations, it is important to examine r squared and the standard error of estimate
• The regression equation simply describes the best fitting line and is used for making
predictions
• However, r squared and the standard error of estimate indicate how accurate the
predictions will be
• Coefficient of correlation (r) the r value which indicates the strength of the linear
relationship between the two variables
• Coefficient of Determination (r squared) squaring the r value provides an indication of
the proportion if the variance in one variable that is accounted for by the variance in the
other variable
• Coefficient of Non Determination (1-r squared): subtracting the squared r value from 1
provides an indication of the proportion of the variance in one variable that is not
accounted for by the variation in the second variable.
Chi Square Test: when we have questions about relative frequencies for a distribution
• Uses sample data to test hypotheses about the shape or proportions of a population
distribution
• Determines how well the obtained sample proportions fit the population proportions
specified by the null hypothesis.
• Null: no preference vs no difference
• Alternative: there is a preference, or there is a difference
Expected frequency: frequency value that is predicted from the null hypothesis and the sample
size – calculated. The observed frequency are what you see, always whole numbers
Chi Square Distribution positively skewed because formula involves adding squared values
Parametric Tests (ANOVA, T-Test)
• Test hypotheses about specific population parameters
• Requires numerical score for each individual in sample (from interval or ratio scale)
• More sensitive test
Non parametric tests
• Hypotheses not stated in terms of a specific parameter
• Use data from nominal or ordinal scales can’t calculate means and variances, it’s all
about frequency
• Less sensitive test
Power
Statistical Power: the POWER of a statistical test is the probability that the test will correctly
reject a false null hypothesis – so the probability of reaching a correct decision
Power
MORE POWERFUL means that will more readily detect a treatment effect when one exists
Higher power leads to correct detection
Power = 1 – Beta
• Sample Size
• If there is a treatment effect in the population, you are more likely to find it with
a large sample than with small samples
• Increasing Power:
• Increase treatment effect
• Reducec “noise” or error
• Hard for researcher to control this
• Increase (weaken) significance level
• Eg: change alpha to .2 instead of .05
• Although this lowers the Type II error the risk of Type I increases
• Increase sample size
• Most common method that researchers use to increase the power that
their study will detect a true treatment effect
RM Anova – give degrees of freedom (4,15) figure out total subjects in study
2 Factor Anova – 3 hypothesis tests – what F ratios are calculated for each test
Graph different scenarios for different F ratios
(main effect for A but not B but interaction, etc)
Correlation – what is R?
Different factors that impact power, and how it relates to type two error (beta)
Pearson
Regression, standard error, test for significance slope
Goodness of fit, independence chi square test
Also correct critical values Df ERROR not df WITHIN