# AP Stat Review Topics Chapter 1: Organizing Data; Calculator Stuff Covered: 1-var stats; statplots Theme: Exploratory Data

Analysis I. Categorical vs. Quantitative Variables II. Meaning of Distribution III. Graphs A. Bar charts & pie charts (of limited use) B. Dotplots and histograms C. Stem and Leaf (Sideways stemplot = histogram!) D. Ogive (cumulative frequency) IV. Interpretation A. Center 1. Mean: balance point of curve 2. Median: equal areas point of curve 3. Mode: highest mountain(s) of curve B. Spread 1. Standard Deviation: avg distance from the mean 2. Quartiles 3. Outliers? KNOW THE IQR Test - Page 80 C. Shape 1. Symmetric? 2. Bimodal? 3. Uniform? 4. Skewness? (Remember - wherever the tail is tells us where the skewness is!) D. Keep in mind Resistant Measures vs. Nonresistant Measures of Center and Spread Chapter 2: The Normal Distributions; Calculator Stuff Covered: normcdf, invnorm, statplots Theme: Introduction to its use and meaning I. Normal Distribution A. Type of Density Curve 1. Total Area = 1; always above horizontal axis B. Symmetric, bell-shaped C. Follows empirical rule: 68% within ±1 std. dev; 95% within ±2 std. dev, 99.7% within ±3 std. dev D. Defined by mean and standard deviation: N(µ, s) E. Standard Normal Curve: N(0, 1). mean = 0, std. dev = 1. F. There are infinitely many normal curves! G. z-score: # of std dev’s from the mean; z =

1. Converts any normal curve into z-units along the standard normal curve H. Also used for probability problems I. To determine if a distribution is normal: 1. Normal quantile plots 2. Histogram or stemplot Chapter 3: Regression; Calculator Stuff Covered: LinReg(ax + b); statplots Theme: Pictorial and mathematical representation of bivariate data I. Scatterplots A. Explanatory (input) vs. Response (output) QUANTITATIVE variables B. Interpretation 1. Direction - positive or negative or no association 2. Form - clusters, gaps, outliers, influentials 3. Strength - correlation II. Correlation: measures strength and direction of an association; -1 < r < 1 A. r = 1 perfect positive association B. r = -1 perfect negative association C. r = 0 no correlation (but that doesn’t necessarily mean the data is randomly scattered) D. Unitless measure; nonresistant E. r 2 ; coefficient of determination; the amount of variability in the response variable explained by the regression line on the explanatory variable

x−µ σ

use power regression (pg 280-284) III. D. Perform linear regression on the log of the response variable vs. residual plot suggests different model required B. we can reduce this with larger sample sizes E. anecdotal evidence. b) Confounding (1) The response is mixed with many explanatory variables. If plot of log y vs. categorical data I. Nonlinear data . Often uses counts or percents B. response bias C. Often uses a two-way table C. Interpreting Correlation and Regression A. Take the log of the response variable. Designing Experiments A. SRS: every set of n individuals has an equal chance of being chosen B.predicted y 1. Nonsampling error = natural variation (this is OK). Residual: observed y . It might be x causing y or it might be other variables. (Don’t really need since you can always use a TRD) Theme: Proper experimental and study design I. exponential regression A. Makes the sum of the squares of the vertical distances of the data points to the line as small as possible D. log x is linear. ASSOCIATION ≠ CAUSATION! IV. Marginal distribution: distribution of row or column variable ALONE (pg 293) D. Extrapolation 2. Multistage II. Used for prediction! C. Randomness a must 1. interpreting correlation and regression.III. bias. Direct relationship between x and y unlikely. use exponential regression (pg 273-275) B. Sampling error: all the stuff listed above. Lurking variables a ) Common response (1) BOTH variables are changing with respect to some unobserved third variable. Mean residuals = 0 Chapter 4: More on Bivariate Data. Conditional Distribution: distribution of row variable with respect to a certain column or distribution of a column variable with respect to a certain row E. Fatal errors to a survey.exponential regression A. Random sample: every individual has an equal chance of being chosen 2. Power regression vs. Least-Squares Regression A. convenience sampling. Bad stuff: voluntary response. Simpson’s Paradox: Aggregation of data can give misleading results (pg 300) Chapter 5: Design of Experiments and Studies. statplots Theme: Modeling nonlinear data. Other ways of sampling: 1. Categorical Data A. If plot of log y vs. Can’t separate effects c) Pg 212 for pictorial representations 3. Calculator Stuff Covered: LinReg(ax + b) (using logs) or ExpReg or PwrReg. x is linear. C. nonresponse. question wording. Beware of: 1. Stratification 2. Curved scatterplot. the regular explanatory variable II. Residual plot should be scattered otherwise a different model would probably be better 2. Designing Samples A. Needs explanatory-response relationship B. lack of realism. Calculator Stuff Covered: RandInt(). Must have a treatment . Table of Random Digits: used for simulation and assignment of subjects to groups F.

State your conclusions. “In our 7 trials.g. II. Always know if you are dealing with disjoint events. we had 3 successes. µ ax±b = aµx ± b B. Calculator Stuff Covered: BinomCDF. IV.g. III. you can’t select the same person twice so repeats not OK. Pictures are a great way to explain experiments: pg 272 D. If you’re using two digits. Randomization: state how randomization will occur in the selection process 3. Replication: larger sample sizes = less natural variation error C. geometpdf Theme: Moving beyond the normal distribution .g. Simulate as many repetitions as they tell you to in the problem 1. Clearly assign the digits (e. (Pg 369) We can then use the table to create a probability histogram Normal distribution = Continuous probability distribution Mean of a DRV = Expected Value.g. VII. Calculator Stuff Covered: invNorm.the ability to have confidence in our answers. pg 483 Std Dev & Variance of a DRV: pg 485 Rules for Means and Variances A. Principles of Experimental Design 1. As n--> ∞. V.”) 1. x --> µ Chapter 8: Binomial and Geometric Distribution. normCDF. P(A or B) = P(A) + P(B) . 1-var stats using two lists Theme: Exploring Discrete and Continuous Random Variables I.”) Chapter 6: Probability Theme: Probability forms the basis of inference . “Stop once two oysters with pearls are found. BinomPDF. a ) When simulating a probability repeats are OK b) If you’re selecting actual individuals to use in a study/experiment. Randomness can have a long term pattern. for disjoint events: P(A and B) = 0 Independent Events: P(A and B) = P(A)P(B) Conditional Probability: P(B | A) = P(A and B) ÷ P(A) A. Prob B given A = Prob Both ÷ Prob Given Chapter 7: Random Variables. I. II. VI. Control: lurking variables by comparing several treatments a ) Blocking b) Placebo group (if possible) c) Double blind (if possible) 2. VII. Random Variable: variable whose value is a numerical outcome of a random phenomenon We sometimes use probability tables for discrete random variables. Statistically Significant: effect too large to attribute to chance alone E. IV. Don’t do 0-63 = success) B. (e. Tree diagrams are the best way to set up most probability problems Always know if you are dealing with independent events. Label directly on the table of random digits to make it clear to both you and the grader when a success is found and when a trial ends D. III. (e. Stopping rule: what causes the simulation to end? (e. Varx±y = Varx + Vary VIII. Order the treatment is applied needs to be randomized! III. statplot.B. then you’re using two digits. Think!! C. V. Don’t forget to state whether repeats are OK or not. 00-63 = getting oyster with a pearl. Matched Pairs Design: two treatments for one subject 1. VI. µ x±y = µx ± µy C.P(A and B). So we estimate our probability of success to be 3/7. Varax±b = a2 Varx (Assuming x and y are independent) D. Running a simulation: A. 64-99 = no pearl) 1. We use this phenomenon. Law of Large Numbers A.

Parameter = description of population II. p) then X ~ N(np. pq ) n ^ 3.Formula for first success on nth trial E. The sampling distribution of x is approximately normal REGARDLESS of the shape of the population distribution if the sample size is large enough! . If X ~ B(n. A. invNorm Theme: Working towards inference and the normal approximation to the binomial. Sampling Variability = natural variation = fact that even correctly computed statistics rarely = parameter value IV. V. Reason this works: sampling distribution of p is close to normal for n large. Thus: P(X > 3) ≠ P(X > 3) 1. Example: P(X = 3) is a PDF III. Mean and Standard Deviation of Binomial Distribution: If X ~ B(n. Mean: µ = np B. Variability = spread about the sample center C. Range of Binomial Probabilities = BinomCDF A.. Central Limit Theorem: For n large. p) A. Fixed number of observations C. µ = 1 /p F. Geometric Distribution: # of trials until first success A. P(X = n) = p• qn-1 <--. See pg 582 As n --> ∞.I. For continuous distributions. V.. Two possibilities: success and failure B. CLT I. B. a binomial distribution is a DISCRETE distribution. Remember. x ~ N(µ. p = probability of success A. Calculator Stuff Covered: NormalCDF. Example: P(X > 3) is a CDF IV. Binomial Distribution: B(n. Probability of success doesn’t change from trial to trial II. 578 VI. D. std deviation gets smaller and smaller If np > 10 and nq > 10 and population is at least ten times larger than the sample. Probability that it takes more than n trials to see first success: P(X > n) = qn Chapter 9: Sampling Distributions. Statistic = description of sample III. B. p ~ N(p. P(X > 3) = P(X = 4) + P(X = 5) + . Normal approximation to the binomial A. Observations are independent D. Sampling Distribution: distribution of values taken by the statistic in all possible samples of the same size from the same population A. Probability doesn’t change from trial to trial C. n = # trials. then: 1.. Specific Binomial Probability = BinomPDF = formula on formula sheet! A. σ = npq VI. µp-hat ---> p As n --> ∞. Exploring bias and variability A. Observations are independent D. > and > or < and < don’t change anything. Two possibilities: success and failure B. C. p). σ ) n A. Remember: p itself is NOT binomial because it is not discrete! VII.. Bias = systematically away from the true center B. while P(X > 3) = P(X = 3) + P(X = 3) + . See pg 576. Example: Taking 1 million samples of size 50 from the population and computing or graphing its statistical values would give a good sense of the sampling distribution. ^ npq ) ^ 2.

.voluntary response. Remember. Net we’re using to try and catch the parameter B. Our effect is unlikely to have occurred by chance alone 2. (No effect present) 1. no change hypothesis. Test Statistic: measure the compatibility between the data and null hypothesis 1. Take a bigger sample 3. Alternative Hypothesis: Ha . Have a clear conception of the null hypothesis 2. Small sample size and nonnormal population will skew the confidence interval. P-value: The probability that our result happened by pure chance alone! 1. Null Hypothesis: Ho . the effect we suspect is true 1. etc. α E. Takes into account sampling error . degrees of freedom (if applicable) D. Confidence Interval: estimate ± critical value • std. Significance Testing: how likely are we to have gotten our results by chance alone? A. Confidence Level tells us how often our method is likely to capture the true parameter C. D. error of the estimate A. H. Check Assumptions 1. Choose a smaller confidence level 2. Margin of Error: critical value • std error of the estimate 1. This means we would only expect to get our result or a more extreme one 3% of the time if the null hypothesis was true. I.03. Significance Level: The level at which we would reject the null hypothesis. More on P-value: Suppose p-value = 0. Test statistic formula. Check Assumptions 1. Z-test Theme: Introduction to the calculations and meaning of significance tests and confidence. If you know the margin of error you want. Confidence intervals have the same assumptions as their corresponding significance tests . you can figure out the sample size you need! 1. Conclusion in context of the problem! E. Determines the size of our net 2. Calculator Stuff Covered: Zinterval. Low p-value ALWAYS indicates strong evidence against the null hypothesis D. II. If P-value > α .. If P-value < α . We fail to reject the null. test statistic value. We accept the alternative hypothesis F. Always written in terms of PARAMETERS C. Improve the measurement process (decreases std deviation) F. Proper Outline of a Significance Test A. low margin of error 1. error of the estimate III. estimate − parameter std. Show your work! C. Proper Outline of a Confidence Interval A. we do NOT have statistical significance 1. It is the probability of getting our statistic or a more extreme one assuming the null hypothesis is true 2. Rules/Cautions about Confidence Intervals 1.” G. Our answer is either correct or incorrect. Our goal: high confidence. (Note: we can NEVER accept the null) 3. p-value. Always written in terms of PARAMETERS B. Does not rescue us from nonsampling error . Carefully decide whether the alternative hypothesis is one-sided or two-sided B. a confidence interval refers to the likelihood that our method worked. Remember to always round up to the next highest whole number!!! G. The confidence interval is not a resistant measure because if it is based on nonresistant measures such as x ! H. “We don’t have sufficient evidence at the α level of significance. Need an SRS! a ) More complicated designs like multistage and stratified require different methods 2. Show your work 2. Meaning: we have a small net that is still likely to catch the parameter! E. We reject the null hypothesis 3. NOT to how likely our answer is to be correct. We don’t have sufficient evidence to say our effect is anything more than pure chance 2. State Hypotheses 1.Chapters 10-11: Introduction to Inference. How to reduce margin of error: 1. Know how to draw the associated p-value picture if required IV. we have statistical significance 1. I.natural variation 3.

Know how to draw the associated picture if required V. t is safe to use as long as there are no outliers or strong skewness 4. It may be large or small. Type I: Rejecting the null when the null is true.. Confidence = 1 . 1propZInt. E. 2propZInt Theme: Now we get into more practical test statistics. Family of curves determined by degrees of freedom 2. Groups should be independent of each other 2. Conclusions for a two sample: Describing the differences between populations a ) Example: We are 95% confident the true mean difference between populations is. Matched Pairs: one group of subjects given two treatments 1. As df --> ∞.α C. α 1. State conclusion in context of the problem! D. It just means we have strong evidence that some effect is present. See Hawthorne Effect (wikipedia).. Two-Sample Stuff: Two groups each getting a different treatment! 1. t-distribution looks like a normal distribution C. Assumption is TWO SRS and that BOTH populations are normal 4. confidence interval results. safe to use regardless of population distribution 7. Type I and Type II errors A. Use Two-Sample t-test if we don’t know population standard deviations a ) Can use pooled if population variances can be considered equal 6. Symmetric 3. Type II: Failing to reject the null when the null is false. Standard Error: When the Std Deviation is estimated from the data A. Two sample t-procedures are even MORE robust than 1-sample t-procedures. Very Robust: Major violations of assumptions still yield good results 2. Now we move on to proportions! I.. I. 2Samp-ZInt. A confidence interval gives you a better sense of the size of an effect! B. Bell-shaped 4. t is safe to use. Null hypothesis is that difference is zero D. Error of x is s n B. Robust: How much the assumptions can be violated and still get credible results 1.B.Power D.. data should be close to normal to use a t-distribution 3. You take the differences in the data and run a 1 sample t-test on those differences 2.T-Test. For 15 < n < 40. VI. Statistical inference doesn’t work on all data sets. 1propZtest. Proportion = Count of successes ÷ sample size . Confidence interval formula. Remember your table! B. safe as long as no outliers or strong skewness c) For n1 + n2 ≥ 40. SRS still needed! C. Power: Prob that you correctly reject the null for some specified value of the alternative Chapters 12-13: Significance Tests in Practice Calculator Stuff Covered: TInterval. 2-SampTInt. 2propZtest. such as t. b) Example: At the 5% level of significance. a ) For n1 + n2 < 15. degrees of freedom (if applicable) C. Cautions about significance tests A. 1 . Example: Std. Spread is larger than for a normal distribution 5. we have strong evidence that there is a difference between the means of the two populations. Rejecting the null doesn’t mean we have evidence that a strong effect is present. For n > 40. Use Two-Sample z-test if we know population standard deviations 5. 2-SampT-Test. Theme: Everything up until now was estimating means. Groups can be different sample sizes 3. For n < 15. both populations should be close to normal b) For 15 ≤ n1 + n2 ≤ 40. t-distribution 1. 2-Samp-ZTest. 1.

ALWAYS right skewed 2. Where we assume p = 0. If this isn’t true.. Chi-Square Test of Independence: Tests whether the distribution of one categorical variable has been influenced by another variable C. Shape of distribution depends on degrees of freedom a ) The higher the degrees of freedom. n1 q1 ≥ 10. Calc Stuff Covered: Chi-SqGOF. we have evidence of a difference between the two population proportions. population at least 10 times larger than the sample pq n ^ B.. Chi-Sq-2Way Theme: We need a way to perform inference on categorical data! I. Assumptions: n1 p1 ≥ 10. 1. Distribution: 1. n2 q 2 ≥ 10. Confidence Interval: A. Significance Testing: A. SRS.^ II. Standard Error of p ^ = Std.3 < p* < .. the more normal Chi-Square looks! b) Chi-Square with 1 degree of freedom = normal distribution squared! ^ ^ ^ ^ . nq ≥ 10 (where p comes from the null hypothesis). p is NEVER binomial because binomial distribution is DISCRETE while proportions are CONTINUOUS ^ pq ) n ^ B. A. Interval: ^^ ±z* p VI. Rejection region is always in the right tail 3. Sample Size: solve margin of error for n. For a large sample.. this is called the conservative value for n. Example: At the 5% level of significance. This answer assumes . 2. Two proportions A. Conclusions Similar to above: 1. Two independent SRS. Test Statistic: z= V. pq n n= (z*) 2 4m 2 Remember to always round up to next whole number if decimal! B.7. Dev of ^^ = p IV. Chi-Square: Two types! A. both populations at least 10 times larger than their sample sizes B. Assumptions: np ≥ 10. We use p (sample proportion) to estimate p (population proportion) ^ III. Remember. Assumptions: n p− p pq n where p = value assumed in null hypothesis ^ p ^ ≥ 10 and n(1- p) ≥10 ^ B. it will simply yield a larger sample size than you really need. Thus. p ^ is approximately normal! p ~ N(p. A. Chi-Square Goodness of fit: allows us to compare more than two proportions to each other by looking at the counts! B. Example: We are 95% confident the true difference is proportions is.5 since that value maximizes the margin of error VII. a) DON’T SAY “TRUE MEAN PROPORTION” Chapter 14: Inference for Tables: Chi-Square. n2 p2 ≥ 10.

” IV. x: is the trend roughly linear? a ) Calculate and graph the regression line on your scatterplot (1) Look at r2 . Test Statistic: Uses t-distribution with (n .[state them] b) Ha : At least one of the population proportions differs from the stated ones --OR-a) Ho : The population distribution is as stated [state distribution] b) Ha : The population distribution is not as stated 1. Response varies normally around the regression line a ) Make a normal quantile plot of the residuals. we expect the average of their responses to be in the interval. Assumptions: 1. Degrees of freedom = (# rows minus 1)(# columns minus 1) E. statplot. t= b SEb II.. Degrees of Freedom: # categories minus 1 Chapter 14: Inference on Regression. Deviation of response fairly constant a ) Residual plot shouldn’t be excessively curved or fanning out 3. YOU MUST SHOW EXP CELL COUNTS! 2. Make a scatterplot of y vs. Observations independent B. LSCI program (TI83/TI84).D.. We’re performing inference on the slope of the regression line.. Significance Testing for Test of Independence: 1. LinRegT-Int Theme: We’re testing whether or not regression is worthwhile to perform! I.. Assumptions: MUST SHOW & EVALUATE EXPECTED CELL COUNTS! a ) For 2 X 2 table. Significance Testing for Goodness of Fit 1. Prediction Interval: answers the question: “We are x% confident that if an individual did x. we expect their response to be in the interval. Confidence Interval on Slope A. Assumptions: Every expected cell count is ≥ 5. Calculator Stuff Covered: LinRegT-Test... Is it linear? b) Or you can make a histogram or boxplot of the residuals. ALL EXPECTED CELL COUNTS ≥ 5 b) For larger than a 2 X 2: (1) AVERAGE of EXPECTED cell counts ≥ 5 (2) NO cell count < 1. Test Statistic: (obs − exp) 2 X =∑ exp 2 2. Hypotheses: a ) Ho : The population proportions are as stated.2) degrees of freedom 1. Std. (3) Note: Expected Cell Count = (row total • column total) ÷ n c) SRS 2. Significance Testing A. Fairly symmetric with no outliers? 4. Alternative: Slope (beta) ≠ 0 or Slope > 0 or Slope < 0 C. Hypotheses: a ) Ho : No association between [row and column variables names] b) Ha : There exists an association between [row and column variables names] 3. How much of the natural variation in the response variable is accounted for by the regression line? b) Look for outliers and influentials 2. 1.” . Mean response confidence interval: answers the question: “We are x% confident that for everyone in the population that did x. b ± t*SEb (These always involve using computer output to construct) III. Null hypothesis: Slope (beta) = 0 means regression useless 2.