This is a first semester study guide for a typical AP Statistics class at Piedmont Hills High School.

Review

Chapter 1 Exploring Data

Categorical Variable places each individual in a category

Quantitative Variable numerical values that measure some characteristic of each

individual

Distribution describes what values the variable takes and how often it takes them

Simpsons Paradox an association between two variables that holds each

individual value of a third variable can be changed or even reversed when the data

for all values of the third variable are combined

Skewed to the Right right side of the graph is longer than the left

Skewed to the Left left side of the graph is longer than the right

Histogram shows distribution of one quantitative variable

When comparing graphs, use SHAPE, OUTLIERS, CENTER, and SPREAD

Five number summary: Minimum, first quartile, median, third quartile, maximum

Interquartile Range: IQR = Q3 Q1

Observation is outlier if it falls 1.5 x IQR above third quartile or below first quartile

Standard deviation

sx =

1

(x i x )2

n1

Percentile the pth percentile of a distribution is the value with p percent of the

observations less than it

Z-score standardized value; use in normal distribution calculations

z=

x mean

standard deviation

Adding a constant to measures of center and location (mean, median,

quartiles, percentiles) increases the measures by that constant; shape of

distribution does NOT change (range, IQR, and standard deviation do not

change)

Multiplying/dividing measures of center and location (mean, median,

quartiles, percentiles) multiplies the measures by that constant; also

multiplies measures of spread (range, IQR, standard deviation); does not

change shape of distribution

Density curve is always on or above the horizontal axis

Median of a density curve is equal-areas point

Mean of a density curve is the balance point of the curve

Normal distribution specified by

N (,) .

Approximately 68% of observations fall within one standard deviation of the

mean

Approximately 95% of observations fall within 2 standard deviations of the

mean

Approximately 99.7% of observations fall within 3 standard deviations of the

mean

For normal distributions, the standardized variables (z-scores) also have a normal

distribution

Response Variable measures outcome of a study

Explanatory Variable may help explain or influence changes in a response variable

Scatterplot shows relationship between two quantitative variables measures on the

same individuals.

How to make a scatterplot

Decide which variable should go on each axis

Label and scale axes

Plot individual data values

Positive association when above-average values of one variable tend of

accompany above-average values of the other variable and when below-average

values also occur together

Negative association when above-average values of one variable tend to

accompany below-average values of the other variable

Correlation r measures the direction and strength of the linear relationship

between two quantitative variables; always a number between -1 and 1

r=

x ix

1

n1

sx

( )(

y i y

sy

DFSO

Direction: talk about how strong the association is (positive or negative

association)

Form: describe the shape (linear, curved, cluster)

Strength: correlation (how close is it to being linear)

o Correlation measures ONLY straight-line relationships

Regression line described how a response variable y changes as an explanatory

variable x changes

Regression Line has a formula

^y =a+bx

^y

unit

o

Extrapolation use of regression line for prediction far outside interval of values

used for the explanatory variable used to obtain the line

THESE ARE NOT ACCURATE DONT USE

Residual difference between the observed value and predicted value

residual=observed y predicted y

residual= y^y

Least-squares Regression Line line that makes the sum of the squared residuals as

small as possible

Most common way to fit line to scatterplot

Also probably the most accurate

Least-squares regression line is the line for the regression line but

b=slope=r

sy

sx

a=intercept= y b x

Residual plot scatterplot of the residuals vs the explanatory variable

Help us assess how well a regression line fits the data

'

'

Coefficient of determination: r2

Gives the fraction of the variation in the predicted values that is accounted for by

the least-squares regression line

IN ENGLISH: how much variation is accounted for by the least-squares regression

line; the higher r2 is, the more accurate the least-squares regression line is

SSE

residual

r =1

=1

SST

( y i y )2

Outlier observation that lies outside the overall pattern; only outliers in the y

direction have large residuals

OBSERVATION IS INFLUENTIAL WHEN removing or adding it would markedly change

the result of the calculation

INCARNATION OF SATAn

Population the entire group of individuals in a study about which we want

information

Sample part of the population from which we collect information

Bad Sampling:

Convenience Sample choosing individuals who are easiest to reach

Voluntary Response Sample consists of people who choose to represent

themselves in a sample

Good Sampling:

Simple Random Sample (SRS) sample of size n that consists of n individuals

from the population so that each individual has an equal chance to be chosen

in the sample

o Use table of random digits

Stratified random sample and strata to choose stratified random sample,

classify population into groups of similar individuals (strata), then choose

separate SRS in each strata and combine the SRSs to form a full sample

Cluster sample to choose cluster sample, divide population into smaller

groups (clusters; should mirror the characteristics of population), then choose

separate SRS in each cluster and combine the SRSs to form a full sample

Undercoverage occurs when some groups in population are left out from the

process of choosing a sample

Nonresponse when an individual chosen for a sample cant be contacted or

doesnt want to respond

Observational study observes individuals and measures variables of interest but

doesnt attempt to influence responses

Experiment deliberately imposes treatment on some individuals to measure their

responses

Lurking variable variable that is not among the explanatory or response variables

but may influence the response variable

Confounding- occurs when two variables are associated in such a way that their

effects on a response variable cannot be distinguished from each other

Treatment specific condition applied to individuals in an experiment

Experimental unite smallest collection of individuals to which treatments are

applied (when humans, often called subjects)

Experimenting Well:

Random assignment when experimental units are assigned to treatments at

random

Completely randomized design when treatments are assigned to all the

experimental units completely by chance

Principles of Experimental Design

1. Control for lurking variables that might affect the response

2. Random assignment: creates roughly equivalent groups of experimental units

3. Replication: use enough experimental units in each group so that any

differences in effects of treatments can be distinguished from chance

differences between groups

Placebo effect the response to a dummy variable

Double-blind experiment where neither subjects nor those who interact with them

and measure the response variable know which treatment a subject received

Statistically significant an observed effect so large that it would rarely occur by

chance

Block group of experimental units that are known before the experiment to be

similar in some way that is expected to affect the response to the treatments

Randomized block design the random assignment of experimental units to

treatments is carried out separately within each block

Matched pairs design common form of blocking for comparing just two

treatments

ACTUALLY PASSING THE FINAL (like 0.7562 im p

confident)

