Psychology 117 Study Guide Statistics: A set of methods and rules for organizing, summarizing, and interpreting information

. (also creates a standard for comparison) Facts and figures that condense large quantities of information into a few simple figures or statements (because people can understand averages!) • Methods • Interpretation Population: set of all the individuals of interest in a particular study Sample: individuals selected from the population as a representative sample Parameter: A numerical measurement describing some characteristic of a population (greek symbol) Statistic: A numerical measurement describing some characteristic of a sample Data – measurements or observations Data set: collection of measurements or observations Datum: a single measurement or observation (called score or raw score) Datum is observation that we collect from individual (called raw score). The complete set is the Data set and after we get these we run analyses. Two Categories Statistical procedures Descriptive Statistics: a set of methods and rules for organizing and summarizing (mean, median, mode, graphs) Inferential Statistics: a set of methods and rules for interpreting information Or techniques that allow generalizations from a sample to a population from which they were selected. Sampling Error: sample data isn’t always perfect. This discrepancy is called sampling error – the amount of error that exists between a sample statistic and the corresponding population parameter. Scientific method and the Design of Research Studies Objectivity: studies should be conducted in a way to prevent biases from influencing the outcome of research (Rosenthal 1963 with rats).

Relationships Between Variables: The GOAL is to find relationships between variables and put order to the universe. The GOAL is to find relationships between variables and put order to the universe. Variable: a characteristic or condition that changes or has different values for different people (x,y) Constant: A characteristic or condition that does not vary, but is the same for every individual (adding 4 pts to everyone’s exam) • The two methods for observing and investigating variables are… o Correlational o Experimental

Correlational Method: looking at observations/variables in their natural environment, we aren’t manipulating anything! studies that observe and measure specific characteristics without attempting to modify the subjects being studied and instead observe the variables as they naturally exist (questionnaires, interviews) PROBLEM is figuring out cause and effect, it’s just a relationship, confounding variables? Experimental Method: Studies that modify a group of subjects in some way in an attempt to establish a cause and effect relationship (lab treatment vs placebo) • Two characteristics to show that changes in one variable are caused by changes in the other variable: o CONTROL: Manipulate independent variable (whether or not get pill) and see what it does to the dependent variable (change in condition) o RANDOM ASSIGNMENT: each subject has an equal chance of being in either condition o HOLDING CONSTANT: method of controlling other variables that might influence results. Each condition must be identical except for the variable that is being manipulated.

Measuring Data Qualitative: categorizing events Quantitative: using numbers to categorize the size of the event Four Levels: Nominal: data that is names, labels, categories, cannot be arranged in scheme (low high) ex: gender, SS#, favorite sports team, etc

Ordinal set of categories organized in an ordered sequence – differences in data values cannot be determined or are meaningless (RANK ORDER) (course grades, rankings, stove settings) Interval: consists of ordered categories where all of the categories are intervals of the exact same size, BUT there is no natural ZERO starting point. (temperature, dates, etc) Ratio: same as the interval scale modified to include the natural zero starting point (weight and prices) Quantitative Data (Classifying) Discrete: separate, indivisible categories (counting something, can’t have 1.5) Continuous: infinite number of possible values that fall between any 2 observed values – is divisible into an infinite number of fractional parts (running water, time) Describing Data: we use descriptive statistics Frequency Distribution Table • Lists different measurement categories or X values in a column from highest to lowest, and beside each X is the number of times each one occurs.

Frequency Distribution Graph HISTOGRAM • Use when frequency distribution has data from an interval or ratio scale (draw bar above X value so height = frequency of score) • Adjacent bars touch eachother, continuous figure BAR GRAPH • Used when presenting frequency distribution data from ordinal or nominal scales • Space between bars to emphasize distinct, discrete categories Stem and Leaf Plot • Represent data by splitting value into 2 parts o Stem – leftmost digit o Leaf – rightmost digit Describing a Distribution • Shape o Symmetrical Kurtosis = peakedness, HUGE peak and tails are extreme o Skewed scores pile up on one end and taper off of other

positively skewed: tail to left negatively skewed: tail to the right Central Tendency – concept of average. the easiest way to describe scores – compute an AVERAGE! o Mean, median, mode o Goal is to obtain a single value that IDs a single score as representative of the entire distribution Variability

Mean is preferred method if possible to use Mode used with nominal scales, it’s easy to compute which one shows up most often Median used when there are a few extreme scores in the distribution (outliers) or ordinal data that is harder to measure, • Extreme scores • Ordinal scale • Open-ended distributions • Undetermined values Relationship between three: For symmetrical distribution: all three in the middle For skewed, median is in middle, mode is at bump, and mean is closer to tail

Measures of Variability Range (Max – min) insensitive because it’s determined by the extreme values Standard deviation/variance – approximates average distance from mean • Standard deviation is a descriptive measure that describes how variable, how spread out, the scores are in a distribution Three characteristics that describe a distribution: 1) Shape (skewed or symmetrical) 2) Central tendency (mean, median, mode) 3) Variability (range, standard deviation) Steps to Find Standard Deviation 1) 2) 3) 4) Find the mean of the data (add up all the values, divide by number) Determine the deviation for EACH SCORE from the mean (x-Mu) Square these values Then find the average of these squared values – The numerator is SS, the sum of squares, and then we divide by N. this is VARIANCE
Variance = (X - Mu)2 / N So mean squared deviation = SS/N

5) To get the standard deviation, we just have to take square root of the variance To ways of calculating SS, the sum of squares 1) Definitional formula: a. Subtract the mean from each value, square it, then add them all up = definitional SS 2) Computational formula: a. Square each value, add up the squares, then SUBTRACT [addition of all the values, squared, then divided by number of values] = computational SS

The problem with populations vs samples is that samples tend to be less variable than populations – so we have to adjust for this bias (which can either be an overestimation or an underestimation) So now we are calculating standard deviation of a SAMPLE instead of a POPULATION X bar is used instead of Mu (for sample mean over population mean) We do the same process, but instead of N we use n-1 in the denominator to adjust for the bias n-1 = degrees of freedom Measures of Relative Standing Mean and standard deviation describe an entire distribution Z scores (standard scores) describe an individual score, using a mean and standard deviation, changes X into Z score, to standardize scores Z Scores describe the precise location of a score within a distribution • Changes each x value into a signed number (+ or -) o Positive is above the mean o Negative is below the mean • Number tells the distance between the score and the mean in terms of the number of standard deviations

Numerator is deviation score (above or below the mean)

Denominator is standard deviation so Z score is in SD units Purposes of Z Scores 1) Transformation (the Z scores) 2) Standardized an entire distribution (takes different distributions and makes them equivalent) If every x value in a distribution was transformed into Z scores, would have following properties: 1) Shape – stays the SAME as the original distribution of raw scores 2) Mean – Z score distribution mean is ALWAYS zero 3) Standard deviation – z score distribution will always have a SD = 1 • • Standardized distribution: Every Z score distribution has the same mean (ZERO) and the same SD (ONE) When two scores are from different distributions, it is impossible to make any direct comparison between them o To calculate we need to raw score, mean and standard deviation of each distribution you are comparing o We can compare Z scores because the z scores are coming from an equivalent standardized distribution

Two steps involved in standardizing a distribution so that it has a predetermined mean and SD • Each of raw scores is transformed into Z score • Each of the Z scores is transformed into a new X value so that a particular mean and SD are achieved Standard score = new mean + Z(SDnew)

Probability How do I make an inference about a population? 1) Develop probability as a bridge from population to sample 2) Reverse the probability rules

Probability of A = number of outcomes of A Total # possible outcomes P(specific outcome) Event never occurs then p = 0 Certain to occur p = 1 • Random sampling: o Each person in population has equal chance being selected (1/N)

o If more than one individuals selected, must be a constant probability for each and every selection o Aka, don’t put the card back Sampling with replacement • Aka putting the card back Probability and standard distribution

1st column: Z score values 2nd column: % mean to Z is percentage of scores between mean and z score 3rd column: % in tail gives the percentage of scores in the tail for that z score

Probability and Samples: The distribution of sample means • Every score from a population has a Z score – says WHERE that score is located in the distribution o We want to be able to transform sample mean into Z scores too, so n can be more than just one value o NOW a Z score represents an entire sample Beyond + or – 2 means an extreme sample

Problems with Samples • Sampling Error: not all samples from the population will be representative of pop • Variability: different samples will vary. We need RULES to relate samples to populations Distribution of samples means: basis for ability to predict sampled characteristics – the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population 1) Select a random sample and calculate the mean 2) Do this for entire set of all possible random samples – this is the distribution of sample means Sampling distribution: is a sampling/distribution of statistics (those means of different samples) obtained by selecting ALL the possible samples of a specific size from a population • If we have a normal distribution we know that we can calculate Z scores and answer probability questions

Sample vs population mean… • Sample means tend to pile up around population mean • Distribution of sample mean is normal in shape • If we have a normal distribution we know that we can calculate Z scores and answer probability questions • • It isn’t possible to list all samples and calculate every possible mean So we have general characteristics of the distribution of sample means, called central limit theorum

Central Limit Theorum for any population with a mean () and SD, the distribution sample means for a sample size n will approach a normal distribution with a mean of () and a SD of

and will approach a normal distribution as n approaches infinity • • Describes the distribution of sample means for any population, no matter what shape, mean, or SD. Distribution of normal means “approaches” a normal distribution very rapidly o by the time n= 30, the distribution is almost perfectly normal

**and remember that any distribution can be defined by shape, varability, and central tendency SHAPE • distribution of sample means tends to be normal distribution if one of following is satisfied: 1) population is a normal distribution 2) number of scores is large or greater than 30 (regardless of shape) CENTRAL TENDENCY sample mean • expected value of X bar is the mean of the distribution of sample means, and it’s always = to population mean. VARIABILITY • standard deviation

SD for the distribution is called the standard error of the X o Measures how much difference should be expected on average between Xbar and mean o Standard error tells us how good an estimate sample is compared to population mean

Standard Error: • Magnitude determined by:

1) size of the sample – Law of Large numbers smaller error margin with larger sample 2) Standard deviation is starting point for standard error, so when n =1, Standard deviation = standard error. As sample size increases, standard error increases with relation to n. Sampling Error: knowing that samples aren’t completely representative of a population Standard Error: measuring the error between the sample mean and the population mean ** method by which we measure sampling error
Standard Error vs standard deviation • Standard deviation o Use when working with a distribution of scores and it measures the standard distance between a score and the population mean Standard Error • Use when you have a question concerning a sample, and it measures the standard distance between the sample mean and the population mean

Hypothesis Testing: a statistical procedure (the most commonly used) that lets us make inferences about a population using sample data. (we use Z scores, probability, and distribution of sample means) Hypothesis Test: a statistical method that uses sample data to evaluate a hypothesis about a population parameter

Testing Procedure • 1) State a hypothesis about a population (usually population parameter) • 2) Obtain a random sample from the population • 3) Compare sample data with hypothesis

Example: using one unknown population and one sample • We know a mean, standard deviation, and sample size, n • We assume if treatment (what we are measuring) has any affect then it will be adding or subtracting a constant from the mean, so SD should be the same

Testing Procedure

1) State a hypothesis

Null Hypothesis (Ho) = treatment has no effect! There is no change • Independent variable (treatment) has no effect on dependent variable for population Scientific or Alternate Hypothesis (H1) = treatment does have an effect • There is SOME TYPE OF CHANGE! The independent variable (treatment) will have an affect on the dependent variable. • Non-directional Test: this hypothesis does not specify direction of change

2) Set the criteria for a decision • Alpha Level / Level of Significance: probability value used to define the very unlikely sample outcomes if the null hypothesis is true. • Determines boundaries of critical region • Defines “unlikely” outcomes • Must select level to minimize Type I error • Largest permissible value is .05 (5%) • Tradeoff between lowering alpha level and difficulty of finding a treatment effect (usually use .05, .01, and .001 to maintain balance) • Critical Region (tails) extreme values that are very unlikely to be obtained if the null hypothesis is true (so if we see values in the critical region, it is likely that null hypothesis is NOT true – the treatment DID have an affect). Boundaries are determined by alpha level. • Sample data fall in critical region null hypothesis rejected 3) Compute Z Score for sample mean

Z = sample mean – hypothesized population mean Standard error between X bar and pop mean We are computing a z score that describes exactly where the sample mean is located relative to the hypothesized population mean from Ho 4) Make a decision Reject the null hypothesis (treatment did have an affect) o so there WAS a change – data is in the critical region, or the tails) Fail to reject the null hypothesis (treatment did not have an affect) o Or we don’t have enough evidence that treatment has an effect, the evidence wasn’t convincing o Data are not in the critical region

So if treatment WORKS: data is in the critical region: we REJECT null hypothesis If treatment DOES NOT WORK: data is NOT in the critical region, we FAIL to reject the NH So in terms of Z scores: • A Z score near 0 means we aren’t in critical region – hypothesis was not correct • A Z score extremely positive or negative also means that hypothesis was wrong Z Score Formula

Z = obtained difference Difference due to chance

Is the result of the research study (obtained difference) more than would be expected by chance alone? Most hypothesis tests require that the obtained difference be 2 to 3 times bigger than chance before the research will reject the null hypothesis (and say that the treatment did cause a difference) Uncertainty and Errors in Hypothesis Testing Inferential Process: • Always possibility that an incorrect conclusion was made • Type 1 Error; when researcher rejects null hypothesis, says that treatment was effective, and in fact it is not • Unlikely to obtain sample mean in critical region when null hypothesis is true. Critical region is determined by alpha level, so alpha level for hypothesis test is equal to probability that test will lead to Type 1 error. Bigger alpha level = bigger critical region = more likely have sample means in that area = more likely to reject null hypothesis and say treatment was effective. So a smaller alpha level is more fine tuned, if we reject the null hypothesis with a smaller alpha level and say treatment was effective, this is more reliable.

Type 2 Error: when a researcher fails to reject a null hypothesis that is really false. So researcher fails to reject a null hypothesis, says that treatment isn’t effective, and it is? Not as serious, and just means that research data doesn’t show results that researched had hoped to obtain. • Is a function represented by Beta symbol

Statistical Test Notation Significance: means that the result is different from what would be expected due to chance

Findings are significant when the null hypothesis is rejected In papers: “treatment with medication had a significant effect on people’s depression scores, z = 3.85, p<.05 • Significant means that we rejected the null hypothesis, as a sample mean fell in the critical region • Z = 3.85 is the Z score for the sample mean • P < .05 is the alpha used for the test, meaning that there must be a less than 5% probability that the finding is due to chance alone • So researching is 95% confident that the obtained difference is greater than what one would expect by chance alone Assumptions for Hypothesis Test with Z Scores 1) Random sampling 2) Independent Observations (no predictible relatinoship between 1st/2nd observ) 3) The value of SD is unchanged by the treatment (we make assumption) 4) Normal sampling distribution T Statistic

Shortfall of Z score test is that it usually involves more information than is available • Population standard deviation or variance • We can’t calculate standard error without this!

When do we use the T statistic? When the population SD is not known. When the variability for the population is not known, we use the sample variability in its place

So Estimated Standard error : used to estimate the real standard error in situations when the SD is unknown. Gives estimate of the standard distance between a sample mean of X bar and the population mean (Mu)

ESTIMATED STANDARD ERROR Sample Standard deviation descriptive statistics Sample Variance inferential statistics

T Statistic – result of substituting the estimated standard error in the denominator of the Z score formula

Only difference between t and z formula is that z score formula uses the actual population variance, and t formula uses the sample variance (because we don’t know the population one) How well does sample variance approximate the population variance? Lets us know how well a t statistic approximates a z score. Df = degrees of freedom = n -1 Sample mean places a restriction on the value of one score GREATER value of df is for a sample, better sample variance represents population variance better t statistic approximates the Z score T Distributions
• • • • •

Every sample from a population can be used to compute a z score or a t statistic T distribution will approximate a normal distribution in the same way that a t statistic approximates a z score • How well it approximates is determined by degrees of freedom • GREATER the sample size (n), then LARGER degrees of freedom (n-1) and the better the t distribution approximates the normal distribution

• •

Shape changes with degrees of freedom – there is a different sampling distribution of t for each possible number of degrees of freedom As df gets very large, the t distribution gets closer in the shape to a normal z-score distribution

Standard error isn’t constant because it is an ESTIMATE based on standard variance, and this varies from sample to sample. • Value of df increases, variability decreases, more closely resembles normal curve

Always find degrees of freedom that is LESS than yours

Hypothesis Test with Z Statistic

And we assume normal population and independent observation Hypothesis Tests with Two Independent Samples

Most research uses two samples! (and concerns a mean difference between 2) • Independent Samples: two sets of data are from completely separate samples • Related/Dependent Samples:: two sets come from the same sample

Notation: We use subscripts (1,2) to distinguish between the two samples with n, sum of squares, population mean, and sample mean.
Hypotheses for Independent Sample T-Tests: Goal is to evaluate mean difference between the two populations… • Null Hypothesis: Ho = Mu1 – Mu2 = 0 • There is no change, affect, or difference, between 2 populations • Alternative Hypothesis H1 = Mu1 - Mu2 IS NOT EQUAL to 0 • There IS a difference, affect, or change between 2 populations So if we reject Ho means that the data indicates a significant difference b/w 2 pops If we fail to reject Ho means that data does not provide sufficient evidence to conclude that a difference exists T Statistic Formula for 2 Samples… we are looking at mean differences

STANDARD ERROR FORMULA Single sample Sx measures how much error is expected between sample mean and population mean Independent Sample Sx1 – Sx2 Measures how much error is expected when you are using a sample mean different to represent a population mean difference So for when n1 = n2 we can use this formula for standard error.
x1 − x2


2 s12 s2 + n1 n2

Numbers, But because of the Law of Large Numbers, we can’t when n1 doesn’t equal n2, we get a biased statistic because we know that larger samples provide better estimates about a population… the proper weights ar not assigned. are So according to this law, sample variances should not be treated equally if sample sizes are different because statistics from larger samples are better estimates. Standard Error when Sample Sizes Different: Pooled Variance allows the bi bigger sample to carry more weight
Sample Variance for One-Sample: s2= SS Sample: df Pooled Variance: sp2 = SS1 + SS2 df1 + df2 Standard error : s X1 –X2 = s p2 + s p2 n1 n2

and now we can go back and calculate the t statistic – the degrees of freedom for the t statistic is the df1 + df2

Assumptions underlying T-formula for Independent Samples formula 1) Independent Observations 2) Normal populations 3) Two populations must have homogeneity of variance – or “equal” variances If there is a large discrepancy (3 (3-4X) between the sample sizes, we worry

Hypothesis Testing Intependent t 4 Step Procedures Step 1: State Ho, State H1, and select alpha Step 2: locate critical region that would be unlikely if null hypothesis was true • Calculate TOTAL degrees of freedom • Use T distribution table Step 3: Get data, compute test statistic • Find pooled variance of two samples • Use pooled variance to compute standard error • Compute t statistic Step 4: Make a decision (look at observed vs critical t)

Hypothesis Testing with Related/Dependent samples – when we get the data from the same individuals, maybe under two conditions RELATED/DEPENDENT SAMPLE DESIGNS Repeated Measures Study: study in which a single sample of individuals is measured more than once on the same dependent variable. – so same subjects used in all of treatment conditions • No risk that subjects in different conditions are different, so its advantageous for researchers to choose this design • We can approximate this style of study by matching subjects… • MATCHING PERFECT Matched Subjects Study • Two separate samples, but each individual matched with a subject in other sample • Goal is to stimulate repeated measures design as closely as possible and on variables that are most relevant • MATCHING depends on variables used for matching process T Scores Based on a difference score (D) = X2 – X1 We use this D score instead of a raw X score – so one D score represents one person’s data Hypothesis for a Related Sample Test • Goal is to use the sample of different scores to answer questions about a general population • Null Hypothesis • Ho MuD = 0 • Mean difference for the general population is zero… some might show a positive difference, some negative, but they average to zero • Alternative Hypothesis • H1 MuD NOT EQUAL to 0 • There is a treatment effect that causes the scores in one treatment condition to be systematically higher or lower than the scores in the other condition

T formula • Sample data for related sample design are difference scores and are identified by the letter D.

First compute variance Calculate standard error (one D score for each individual) We refer to number of D scores instead of number of f scores So 5 people = 5 difference scores = 4 degrees of freedom Step 1: State hypotheses Ho and H1 and select alpha level Step 2: locate critical region ( sample data that would be extremely unlikely if null hypothesis were true) Step 3: Get the data, compute test statistic (t statistic) Step 4: make a decision! If t statistic is in critical region, reject null hypothesis, otherwise we say that the data do not provide sufficient evidence that the two populations are different

Assumptions for Related Samples t Tests 1) Observations within each treatment condition are independent 2) Population distribution of difference scores must be normal When to use Related Samples • When researcher would like to study a particular type of subject that is not commonly found • When researcher questions concern changes in responses across time When to not use Repeated Measures • When you are comparing two different populations you MUST use separate samples from each population (men vs women, etc) Advantages of Related Samples Studies • Each subject enters research study with his or her own individual characteristics – and these differences can influence our scores and create problems when interpreting results • With independent measures design there is always the potential that the individuals in one sample are substantially different than the individuals in the other sample

A repeated measure design eliminates this problem because the same individuals are used in every treatment.

ESTIMATION – still using sample statistics to make inferences about an unknown population Two types of estimates:

Point estimates • Using a single number as estimate for an unknown quantity • Precise but less confident • Most sample means pile up around center where z = 0, so we set z = 0 for

Interval estimates (confidence intervals) • Using range of values as an estimate of an unknown quantity • Less precise but more confident • As interval increases confidence increases, but precision decreases • Steps • We select a range of Z values associated with confidence interval • Commonly used confidence value is 60% and up • 90% confidence = crit tails are 5% so Z score is +/- 1.65

Confidence Interval: When an interval level is accompanied by a specific level of confidence (or probability) it is called a confidence interval. So a 90% confidence interval means that we are 90% sure that the scores fall somewhere within the interval • Narrow interval = more precise • WIDTH is influenced by sample size and level of confidence • N gets larger, then width gets smaller more precise • % confidence increases, width gets larger less precise Hypothesis Tests: answer question did it have an effect? (YES/NO) Estimation: answers question “how much of an effect, and in what direction?” (value) WITH Z SCORE
• WHEN: we do estimation with a Z score when the SD is KNOWN but Mu is not Z or t = sample statistic – unknown population parameter Standard error

Sample statistic is like X, D bar Standard error is like Sx

Unknown population parameter is like Mu

Unknown population = sample statistic +/- (z or t) * standard error parameter Mu = Xbar +/- Z * Sx

Our goal is to find population mean so we solve equation for Mu WITH T SCORE/STATISTIC Unknown population mean = sample mean (X bar) +/- standard error or mean difference or mean difference VARIATIONS OF DIFFERENT T TESTS

Procedure for Estimation with T Statistic 1) Use sample data to computer the sample mean (or mean difference) and the standard error 2) Estimate a value, or range of values, for t. * point estimate t = 0 * interval estimates we translate confidence interval (90%) into t scores 3) calculate sample means and standard error from the sample data and estimated value for T from T table and PLUG IN to estimation equation Confidence Intervals and Statistical Significance
• •

If 0 is in the range it is NOT a significant finding we need to be able to specify direction of effect If an interval contains zero, then a smaller interval will contain it as well?

Know Your Tests
• • •

1 sample Z 1 sample T (for when there is no SD) Indep/dep for T

Spelling test at beginning and end of first grade • Dependent sample T Test Compare SAT scores kids with extracurriculars to average population • (we know SD and mean) • 1 sample Z New drug for depression – two groups random assignment • 2 sample independent • Because we have 2 ind. Groups, 1 placebo and 1 drug, 2 tests Learning in children with autism – match with group normals • Dependent – matched variable age Children behavioral problems, random sample divorced children and measure behavioral problems on standardized scale for which normal pop. Mean is 100 • 1 sample t score don’t have SD of population, use sample variance New hunger drug, three groups of rats…. ANOVA!

Analysis of Variance (ANOVA): A hypothesis testing procedure used to evaluate mean differences between two or more treatments (or populations) • We use with 2+ samples and when we can’t calculate a sample mean difference • Like independent t samples test – uses sample data to draw conclusions about population means, but t tests are limited to 2 samples, and ANOVA can do two or MORE because we use VARIANCE • Can be used with independent measure or repeated variable design Independent Variable: (control) what is manipulated to make different treatment conditions Quasi independent variable: a non manipulated variable used to differentiate a group of scores Both of these are called FACTORS Single Factor Design: research that involves one factor Factorial Design: more than one factor Two Interpretations 1) NO DIFFERENCES (or differences due to sample error) (Null hypothesis) Mu 1 = Mu 2 = Mu 3 2) REAL DIFFERENCES (alternate hypothesis) At least one population mean is different from the others Test Statistic for ANOVA – test statistic is called an F ratio T = obtained difference between sample means Difference expected by chance (error) F = variance (difference) between sample means Variance (differences) expected by chance (error)

For both of these, a large value means that the sample mean difference is more than chance variance difference

Between treatment variance: difference between treatment conditions Either caused by treatment effects or due to chance: Individual differences Experimental error Within treatment variance: difference that is likely to occur just by chance alone • Inside a treatment condition, set of individuals are treated the same, so if they were treated the same, then why are the scores different? Differences within a treatment are due to chance F = variances between treatments variance within treatments

unsystematic (ERROR)

F Ratio: Total variability = Between treatment variability + within treatment variability F = treatment effect + differences due to chance Differences due to chance So when treatment has no effect, F = 1 When treatment has an effect F ratio noticeably greater than 1 Important Notation for ANOVA K = number of levels of a factor (number of treatment conditions) (50, 70, 90 degrees) N = number of scores in EACH treatment (count values in each column) N = total number of scores in the ENTIRE study When they are equal Sample size N = k * n T = the total for EACH treatment (so sum of all of the Xs) G (Grand Total) = sum of ALL scores in research study (so add up all the T’s, or all the numbers) 2 separate Analyses
• •

Compute SS TOTAL (between and within) Compute df TOTAL

F = MSbetween MSwithin

And SStotal = SSbetween + SSwithin Remember that we need SS and df to find variance s(squared)

DEGREES OF FREEDOM Df total = N -1 Df within = N –k Df between = k -1

Check work with this table:

Post Hoc Tests tells us where differences are, after we have rejected null for ANOVA and decided that not all means are the same Pairwise comparisons: looking at two treatments at a time for a Post Hoc test Familywise Error Rate: as we do more separate tests, the risk of making at least one Type I errors in the family goes UP! Alpha .26 = 26% chance of Type I error Tukey’s Honestly Significant Difference Test (HSD):

Allows us to compute a single value that determines the minimum difference between treatment means that is necessary for significance – we compute it and then use it to compare any two treatment conditions • If mean difference exceeds HSD we can conclude that there is significant difference, if not, we conclude treatments aren’t different from one anoher

Scheffe Test – uses an F ratio to test for a significant difference between two treatment conditions 1) start with largest mean difference, list n’s, means, and T’s, and find G and N 2) Compute a new MSbetween sums(Tsquared/n – Gsquared/N) 3) Use Old dfbetween k-1 4) MS between = New SSbetween / df between Relationship between ANOVA and T Test

When you have data from an independent measures experiment with only two treatmentconditions, you can use either a t test or an independent-measures ANOVA Makes no difference because they will always results in the same statistical decision Basic relationship can be stated F = t2 Remember F-ratio is based on squared differences while t statistic is based on differences Assumptions for Independent Variable ANOVAs 1) The observations within each sample must be independent 2) The populations from which the samples are selected must be normal 3) The populations from which the samples are selected must have equal variances (homogeneity of variance)

• • •

Statistical package for the Social Sciences 1968 – needed to quickly analyze volumes of data Can use with Windows point and click approach or syntax

Two views • Variable view • Data view Some menus: • Data • Transform • Analyze • Graph Questions we might ask • Is there a difference in age between short, medium, and tall individuals? • Is there a difference between before and after scores? • Is height of males in sample different than national average? Repeated Measures (RM) ANOVA (ANOVA for three people under the same conditions)

F ratio in the ANOVA compares all different sample means in a SINGLE test using a SINGLE alpha level • Using a t statistic would require multiple t tests to evaluate all the mean differences which inflates the Type 1 error rates Single factor research study – involves one independent variable Independent measures study – study uses a separate sample for each of treatments Age and time are the most commonly used factors

• • •

Hypotheses for RM ANOVA Null: mu1=mu2=mu3 • All treatments have exactly the same effect • General population no mean difference • Any difference between sample means if from chance alone Alternative: At least one treatment mean is different from the others • Treatment conditions are responsible for causing mean differences among samples F-ratio for RM ANOVA F = treatment effect = chance/error (excluding indiv diff) Chance/error (excluding individual difference) Actual mean difference between treatment Amount of difference would be expected just by chance

F = variance (difference) between treatments Variance (difference) expected by chance/error F = Variance/Difference between treatments (without individual differences) Variance/differences expected by chance (with individual differences removed) Differences in F Ratios • RM Design eliminates variance caused by individual differences (stuff that is different between people, even with matching) in the numerator and denominator – since the same subjects are used in every treatment condition • Since SAME individuals used in every treatment condition, we can measure the SIZE of individual differences – these differences are consistent • Because there is a consistent difference between subjects across all treatment conditions, we can be reasonably confident that the 10 point difference is simply not chance or random error, but rather is a SYSTEMATIC and PREDICTABLE measure of the individual differences between subjects. Logic of RM ANOVA I. Variance between Treatment (NUMERATOR F Ratio) 1) Treatment effect: **what the researcher wants to see Different treatment conditions cause the individuals scores in one condition to be higher or lower than in another condition 2) error on chance Even if there is no treatment effect, still possibility of differences due to chance Same individuals measured two different times; still chance of unsystematic and unpredictable chance or error II.
• • •

Variance due to chance or error (DENOMINATOR f Ratio) Measure the variance due to random sources of error without including individual differences Calculate within treatment variance (as we did with independent measures) but then subtract out the individual differences Results in a measure of unsystematic error

Notation and Formulas 1st stage: (identical to independent measures) NUMERATOR • Total Variability • Variance between treatments • Variance within treatments • Use same notation and formulas as before 2nd stage: (DENOMINATOR) • Goal is to remove the individual differences from the denominator of the F ratio • Begin with the variance within treatments and then measure and subtract out the individual differences • The remaining variance is often called residual or error variance

Measures how much variance is reasonable to expect by chance after the individual differences have been removed.

K = number of treatment options n = number of scores in each treatment condition N = total scores in study Sum of ALL scores = G = sum of T’s Sum of scores in each treatment = T SS = sum of squares for each treatments Sum X2 = sum of the squared scores for the entire study P = total number of scores for each individual in study (“persons total”) P values reflect individual differences, and we use P values to calculate SSbetween subjects F = MSbetween MSerror S2 = SS Df

Source table for RM ANOVA Source Between Treatments Within Treatments Between subjects Error Total SS df MS F

• •

When k is greater than 2 we must use post hoc tests to determine where differences lie. We can use Sheffe and substitute MS error in place of MSwithin

Advantages and Disadvantages of RM ANOVA
• •

Desirable if supply of subjects is limited because we are using fewer subjects Eliminates role of variability due to individual differences • More sensitive to detect actual treatment effects where there are large individual differences May produce carry over effects (changes in behavior because of inbetween error or progressive error (like subjects getting tired because they have to take IQ tests over and over again) and this is a bad reason to use the same people!

Assumptions of RM ANOVA
• • •

Independent observation Normal distribution (only important with small samples) Variances of the population distributions for each treatment should be equivalent

Introduction to Factorial Design Factorial Design: research study with more than one factor • Will limit to 2 factors • Independent measures and all n’s equal • 2 factor ANOVA • THREE separate hypothesis into one analysis • Each of these tests will be based on its own F ratio computed from the data • Two independent variables called Factor A and factor B Example: testing heat vs humidity (independent variables) and seeing how different combinations affect some sort of performance (dependent) So we match each humidity with each temperature in a chart. • 2 factor ANOVA will test for • Mean difference between two humidity levels • Mean differences between the three temperature levels • \any other mean differences that may result from unique combinations of specific temperature and specific humidity levels • (high humidity may be especially disruptive when the temperature is also high)

MAIN EFFECTS: mean differences produced by factors independently, mean difference among levels of one factor • Do difference in factor A (humidity) lead to differences in performance? • Evaluate mean difference between the ROWS, between 30% and 70% humidity – this difference is called the main effect for Factor A. • For factor B, look at means of COLUMNS, differences are main effect Factor B, Temperature • We must evaluate main effects with hypotheses to test for significance • Evaluation of main effects make up 2/3 hypotheses tests in 2 factor ANOVA • 2 F ratios to be evaluated independently Main Effect Hypotheses Null: There is no mean difference between the two levels: Ho: MuA1 = MuA2 Alternative: the two different levels do produce different scores H1: MuA1 is NOT = to MuA2 Main Effect F Ratio (A) F = variance (difference) between mean for Factor A Variance (differences) expected by chance/error F = variance (differences) between row means Variance (differences expected by chance/error Main Effect F Ratio (B) F = variance (differences) between column means Variance (differences) expected by chance/error Interaction Between Factor: any “extra” mean differences that are not explained by the main effects INTERACTION: mean difference produced by factors acting together. Occurs whenever the mean differences between individual treatment conditions, or cells, are different from what would be predicted from the overall main effects for the factors. Interaction Hypotheses: Null: there is no interaction between factors A and B. All the mean differences between treatment conditions are explained by the main effects of the 2 factors Alternative: there is an interaction between factors. The mean differences are not what would be predicted from the overall main effects for the 2 factors F = Variance (mean difference) not explained by main effects Variance (differences) expected by chance/error

How do we know when there will be an interaction? • When two factors are interdependent, and influence one another • No interaction: Interdependency • Does the size of Factor A (top row vs bottom row) depend on factor B? Is the change in humidity of X points the same for all levels of temperature? If not, there is NO INTERACTION • Interaction: interdependency • If changing factor A does influence factor B • Find difference between rows of each column, and see if it is consistent • If they are independent, there is no interaction

2 factor ANOVA has three separate, independent tests. We can find… • Significant Main effect for A but not significant main effect for B and no interaction • Significant Main Effect for both A and B but no interaction • Significant main Effect for both A and B and a Significant Interaction

Doing the analysis… • We need variance values for THREE F Ratios • 3 between treatment variances • 1 within treatment variance • Mean square = MS = SS/df

Correlation Statistical technique that is used to measure and describe a relationship between two variable. • Observed existing in natural environment, not controlled or manipulated, • CORRELATION DOES NOT IMPLY CAUSATION! • Need Xs and Ys, two from each individual = 1 point

Characteristics of a Relationship: 1) Direction of the relationship 2) The Form of the Relationship 3) The Degree of the Relationship Direction of the Relationship POSITIVE: X moves in the same direction as Y, X increases, so does Y. X decreases, so does Y • Beer sold and temperature NEGATIVE: 2 variables move in opposite directions: As X increases, Y decreases. INVERSE relationship • Coffee sold and temperature Form of the Relationship • Linear/straight line is most common Degree of the Relationship • Correlation measures how well the data fit a straight line • +/- 1 means a perfect fit • 0 is no fit at all • Intermediate values represent the degree to which the data points approximate the perfect fit. Where and Why are Correlations Used?

Prediction: if two variables are known to be related in some systematic way, it is possible to make accurate predictions about the other (SAT scores and college GPA) Validity: correlation is commonly used to demonstrate the validity of a test • If you are measuring what you say you are measuring then scores on your measure should be related to other measures of the same construct Reliability: A reliable measurement procedure will produce the same (or nearly the same) scores when the same individuals are measured under the same conditions. One way to evaluate reliability is to use correlations to determine the relationship between two sets of measurements

The Pearson Correlation • Most commonly used calculation of correlation • Measures the degree and direction of linear relationship between 2 variables • Defined by the letter r • Conceptually computed by • R = degree to which X and Y vary together • Degree to which X and Y vary separately Calculation Pearson Correlation • Sum of products of deviations (SP) • Measures the amount of covariability between two variables (degree to which X and Y vary together)

Interpreting Pearson Correlation: 4 Considerations
• • • •

Correlation describes a relationship, nothing more – CORRELATION DOESN’T IMPLY CAUSATION! Never generalize correlation beyond range of scores provided by data – value of correlation affected by range of scores OUTLIERS are extreme values that can have huge effect on correlation Correlation is NOT EQUAL to proportion r squared = accuracy, so .5 squared = 25% acc.

Other correlation measures
• •

Pearson correlation is most commonly used with data from an interval or a ratio scale of measurement Other correlation measures have been developed for nonlinear relationships and for other types of data

Spearman Correlation • Used in two situations • Measure a relationship between variables measures on ordinal scale of measurement • Can be used with ratio and interval scales even when the relationship is not linear Point Biserial Correlation • Used to measure the relationship between two variables in situations where one variable is measured on an interval or ratio scale but the second variable has only two different values (called dichotomous) Phi-Coefficient • Used when both variables (X and Y) measured for each individual are dichotomous Regression: the statistical technique for finding the best fitting straight line for a set of data is called regression, and the resulting straight line is called the regression line. Line serves two purposes • Show center of the relationship

• •

Can be used for prediction (X related to Y) Regression is procedure that identifies and defines the straight line that provides the best fit for any specific set of data – Line of Best Fit – defined by equation

Linear Equations: • Y = bX + a • b is the slope • a is the y intercept Least Squares Solution: distance between this line and each data point. This is called predicted Y, and is called Y hat. • Distance = Y – Yhat • Best fitting line has smallest total squared error • Least squared error solution Regression Line for Y: Yhat = bX + a Cautions about interpreting predicted values: • The predicted value is not perfect. There will be some error between the predicted Y values (on the line) and the actual data. • Although the amount of error will vary from point to point, on the average the errors will be directly related to the magnitude of the correlation • A correlation near +/- 1 the data points will generally be close to the line (small error) • But as the correlation gets nearer to zero the magnitude of error will increase Cautions about interpreting predicted values • The regression line should not be used to make redictions for x values that fall outside the range of values covered by the original data. This is because you have no information about the X-Y relationship outside that range Standard Error Estimates: a measurement of the standard distance between a regression line and the actual data points. This is sum of squares, or SSerror It is possible to find the best fitting regression equation for any set of data by simply using the formulas already presented
• • •

Accuracy of this prediction depends on how well the points on the line correspond to actual data points So while a regression equation by itself allows you to make predictions, it does not provide any information about the accuracy of the prediction To measure the precision of the regression a standard error of estimate must be computed

Variances = SS/df df = n-2 Standard error of estimate = SQROUTE(SS error/df) SSerror = sum(Y- Yhat)squared

Standard Error vs Correlation
• • •

Standard error is directly related to the magnitude of the correlation between X and Y If the correlation is near +/- 1m the data points will be clustered close to the line and the standard error of estimate will be small As the correlation gets near zero, the line will provide less accurate predictions, and the standard error will grow larger

Standard Error and Correlation • Earlier we learned that squaring the correlation provides a measure of the accuracy of the prediction • R squared is called the coefficient of determination because it determines what proportion of the variability in Y is predicted by the relationship with X • Thus 1 – rsquared to measure the error portion SSerror = (1-rsquared)SSy Correlation and Regression
• • •

Because it is possible to have the same regression line for sets of data that have different correlations, it is important to examine r squared and the standard error of estimate The regression equation simply describes the best fitting line and is used for making predictions However, r squared and the standard error of estimate indicate how accurate the predictions will be

Interpretation of the r Value
• •

Coefficient of correlation (r) the r value which indicates the strength of the linear relationship between the two variables Coefficient of Determination (r squared) squaring the r value provides an indication of the proportion if the variance in one variable that is accounted for by the variance in the other variable Coefficient of Non Determination (1-r squared): subtracting the squared r value from 1 provides an indication of the proportion of the variance in one variable that is not accounted for by the variation in the second variable.

Multiple regressions have more than one predictor Chi Square Test: when we have questions about relative frequencies for a distribution • Uses sample data to test hypotheses about the shape or proportions of a population distribution • Determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis. • Null: no preference vs no difference • Alternative: there is a preference, or there is a difference Expected frequency: frequency value that is predicted from the null hypothesis and the sample size – calculated. The observed frequency are what you see, always whole numbers Chi Square Distribution positively skewed because formula involves adding squared values

Parametric Tests (ANOVA, T-Test) • Test hypotheses about specific population parameters • Requires numerical score for each individual in sample (from interval or ratio scale) • More sensitive test Non parametric tests • Hypotheses not stated in terms of a specific parameter • Use data from nominal or ordinal scales can’t calculate means and variances, it’s all about frequency • Less sensitive test Assumptions and Restrictions for Chi Square Test • To use chi square test for goodness of fit or a test of independence, these conditions must be satisfied • Independence of observation: each frequency = diff person • Size of expected frequencies – should not be performed when expected frequency is less than 5 Power Statistical Power: the POWER of a statistical test is the probability that the test will correctly reject a false null hypothesis – so the probability of reaching a correct decision

Purpose of hypothesis test is to determine whether or not a particular treatment has an effet – and there is always risk of making wrong conclusion, like Type I error. We minimize this risk by selecting an alpha level that determines the maximum probability of committing a type I error

Power MORE POWERFUL means that will more readily detect a treatment effect when one exists Higher power leads to correct detection Power = 1 – Beta What Determines the Power of a Study? • Main Factors: • How big an effect the research hypothesis predicts • How many participants are in the study (sample size) • Significance level chosen Usually we attempt to design study with 80% power FACTORS THAT INFLUENCE POWER

Size of treatment effect • Power d epends on size of treatment effect • When treatment has large effect, will be easy to detect this effect and power will be high • When treatment effect is small, it will be difficult to detect, and power will be low

Significance Level • Less extreme significance/alpha levels (.1/.2) will result in more power (because shaded area/critical region is bigger and its easier to reject null • More extreme significance levels = less power Sample Size • If there is a treatment effect in the population, you are more likely to find it with a large sample than with small samples Increasing Power: • Increase treatment effect • Reducec “noise” or error • Hard for researcher to control this • Increase (weaken) significance level • Eg: change alpha to .2 instead of .05 • Although this lowers the Type II error the risk of Type I increases • Increase sample size • Most common method that researchers use to increase the power that their study will detect a true treatment effect

Sample Size Estimation • The goal is to make sample size “large enough” but not wastefully large • In order to estimate sample size you need alpha, desired power, and estimated effect size Final Exam Studying RM Anova – give degrees of freedom (4,15) figure out total subjects in study 2 Factor Anova – 3 hypothesis tests – what F ratios are calculated for each test Graph different scenarios for different F ratios (main effect for A but not B but interaction, etc) Correlation – what is R? Know three characteristics about what goes into correlation (r) Direction Strength (degree) Form (linear) since is pearson R2 is accuracy, how much variance y explained by variance x Regression, goal is to find line of best fit Difference between little r, r squared, and (1-r)squared Y hat (predicted Y values) and Y(actual), how different When would I use multiple regression?

Chi squared – difference between goodness of fit and independence tests, know df for each one, shape of chi square distribution – its positively skeweed hump, most around zero. Why? Because formula is f0-fe SQUARED so there will never be negative numbers Different factors that impact power, and how it relates to type two error (beta) Computational (know how to solve everything) Bonus: source table 2 measures anova Pearson Regression, standard error, test for significance slope Goodness of fit, independence chi square test Also correct critical values Df ERROR not df WITHIN

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.