You are on page 1of 6

AP Statistics First Semester Final

Review
Chapter 1 Exploring Data
Categorical Variable places each individual in a category
Quantitative Variable numerical values that measure some characteristic of each
individual
Distribution describes what values the variable takes and how often it takes them
Simpsons Paradox an association between two variables that holds each
individual value of a third variable can be changed or even reversed when the data
for all values of the third variable are combined
Skewed to the Right right side of the graph is longer than the left
Skewed to the Left left side of the graph is longer than the right
Histogram shows distribution of one quantitative variable
When comparing graphs, use SHAPE, OUTLIERS, CENTER, and SPREAD
Five number summary: Minimum, first quartile, median, third quartile, maximum
Interquartile Range: IQR = Q3 Q1
Observation is outlier if it falls 1.5 x IQR above third quartile or below first quartile
Standard deviation

sx =

1
(x i x )2
n1

Chapter 2 Modeling Distributions of Data


Percentile the pth percentile of a distribution is the value with p percent of the
observations less than it
Z-score standardized value; use in normal distribution calculations

z=

x mean
standard deviation

Effect of Adding, Subtracting, Multiplying, or Dividing by a constant


Adding a constant to measures of center and location (mean, median,
quartiles, percentiles) increases the measures by that constant; shape of
distribution does NOT change (range, IQR, and standard deviation do not
change)
Multiplying/dividing measures of center and location (mean, median,
quartiles, percentiles) multiplies the measures by that constant; also
multiplies measures of spread (range, IQR, standard deviation); does not
change shape of distribution

Density curve always has an area of 1 under it


Density curve is always on or above the horizontal axis
Median of a density curve is equal-areas point
Mean of a density curve is the balance point of the curve
Normal distribution specified by

N (,) .

Normal distribution has the 68-95-99.7 rule


Approximately 68% of observations fall within one standard deviation of the
mean
Approximately 95% of observations fall within 2 standard deviations of the
mean
Approximately 99.7% of observations fall within 3 standard deviations of the
mean
For normal distributions, the standardized variables (z-scores) also have a normal
distribution

Chapter 3 Describing Relationships


Response Variable measures outcome of a study
Explanatory Variable may help explain or influence changes in a response variable
Scatterplot shows relationship between two quantitative variables measures on the
same individuals.
How to make a scatterplot
Decide which variable should go on each axis
Label and scale axes
Plot individual data values
Positive association when above-average values of one variable tend of
accompany above-average values of the other variable and when below-average
values also occur together
Negative association when above-average values of one variable tend to
accompany below-average values of the other variable
Correlation r measures the direction and strength of the linear relationship
between two quantitative variables; always a number between -1 and 1

How to calculate correlation R

r=

x ix
1

n1
sx

( )(

y i y
sy

How to describe scatterplots


DFSO
Direction: talk about how strong the association is (positive or negative
association)
Form: describe the shape (linear, curved, cluster)
Strength: correlation (how close is it to being linear)
o Correlation measures ONLY straight-line relationships
Regression line described how a response variable y changes as an explanatory
variable x changes
Regression Line has a formula

^y =a+bx

^y

b is the slope by which y is predicted to change when x increases by one

is the predicted value

unit

is the y-intercept, the predicted value of y when x=0


o

Y-intercept doesnt always make sense in the context of the problem

Extrapolation use of regression line for prediction far outside interval of values
used for the explanatory variable used to obtain the line
THESE ARE NOT ACCURATE DONT USE
Residual difference between the observed value and predicted value

residual=observed y predicted y
residual= y^y
Least-squares Regression Line line that makes the sum of the squared residuals as
small as possible
Most common way to fit line to scatterplot
Also probably the most accurate
Least-squares regression line is the line for the regression line but

b=slope=r

sy
sx

a=intercept= y b x
Residual plot scatterplot of the residuals vs the explanatory variable
Help us assess how well a regression line fits the data

Standard deviation of the residuals (s)


'

'

iai n t putting the formulafuck this shit i t s on page 177

s gives the approximate size of the typical or average prediction error

FINALS ARE AN UNENDING NIGHTMARE I WANT TO RIP MY HAIR OUT

Coefficient of determination: r2
Gives the fraction of the variation in the predicted values that is accounted for by
the least-squares regression line
IN ENGLISH: how much variation is accounted for by the least-squares regression
line; the higher r2 is, the more accurate the least-squares regression line is

SSE
residual
r =1
=1
SST
( y i y )2

Outlier observation that lies outside the overall pattern; only outliers in the y
direction have large residuals
OBSERVATION IS INFLUENTIAL WHEN removing or adding it would markedly change
the result of the calculation

Chapter 4 Designing Studies ft. STATS IS THE


INCARNATION OF SATAn
Population the entire group of individuals in a study about which we want
information
Sample part of the population from which we collect information
Bad Sampling:
Convenience Sample choosing individuals who are easiest to reach
Voluntary Response Sample consists of people who choose to represent
themselves in a sample
Good Sampling:
Simple Random Sample (SRS) sample of size n that consists of n individuals
from the population so that each individual has an equal chance to be chosen
in the sample
o Use table of random digits
Stratified random sample and strata to choose stratified random sample,
classify population into groups of similar individuals (strata), then choose
separate SRS in each strata and combine the SRSs to form a full sample
Cluster sample to choose cluster sample, divide population into smaller
groups (clusters; should mirror the characteristics of population), then choose
separate SRS in each cluster and combine the SRSs to form a full sample
Undercoverage occurs when some groups in population are left out from the
process of choosing a sample
Nonresponse when an individual chosen for a sample cant be contacted or
doesnt want to respond
Observational study observes individuals and measures variables of interest but
doesnt attempt to influence responses
Experiment deliberately imposes treatment on some individuals to measure their
responses

Lurking variable variable that is not among the explanatory or response variables
but may influence the response variable
Confounding- occurs when two variables are associated in such a way that their
effects on a response variable cannot be distinguished from each other
Treatment specific condition applied to individuals in an experiment
Experimental unite smallest collection of individuals to which treatments are
applied (when humans, often called subjects)
Experimenting Well:
Random assignment when experimental units are assigned to treatments at
random
Completely randomized design when treatments are assigned to all the
experimental units completely by chance
Principles of Experimental Design
1. Control for lurking variables that might affect the response
2. Random assignment: creates roughly equivalent groups of experimental units
3. Replication: use enough experimental units in each group so that any
differences in effects of treatments can be distinguished from chance
differences between groups
Placebo effect the response to a dummy variable
Double-blind experiment where neither subjects nor those who interact with them
and measure the response variable know which treatment a subject received
Statistically significant an observed effect so large that it would rarely occur by
chance
Block group of experimental units that are known before the experiment to be
similar in some way that is expected to affect the response to the treatments
Randomized block design the random assignment of experimental units to
treatments is carried out separately within each block
Matched pairs design common form of blocking for comparing just two
treatments

Chapter 5 Probability: What are the Chances OF ME


ACTUALLY PASSING THE FINAL (like 0.7562 im p
confident)