53 views

Uploaded by m234567

List of Important AP Statistics Concepts to Know

- 97em Master
- 321
- Presentation 1
- ECON1203 Hw Solution week11
- Comparing ’Ex-Cathedra’ and IT-Supported Teaching Methods and Techniques Policy of Teaching Practice
- Principles and Risks of Forecasting-Robert Nau
- BA 578 Live Fin Set2
- [BIOSTAT] Inference for Bivariates and Regression Coefficients Lab
- course work
- utf-8__session11
- Business Stats
- copy of outputstatsproject 1
- Correlational Design
- Chapter 12
- 548-1724-1-PB
- jurnal-matematika.pdf
- Assignment 3
- MH0038 Research Methodology
- A statistical method for assessing network stability using the Chow test
- Non Parametric Tests

You are on page 1of 9

Statistics

Important Concepts

Here is a list of important concepts by chapter. You should skim the chapters that seem

unfamiliar (based on this list), and look at the chapter reviews.

Chapter 1

Distribution of a variable tells the values a variable attained and how often.

Describe a distribution of a quantitative variable by describing shape, center, and

spread

Describe symmetry distributions using mean and standard deviation use !"number

summary for skewed distributions

#ean is not resistant and is always pulled toward the tail

$tandard deviation is always positive and equals %ero only when all observations are

identical

&ive number summary' #in, (), #edian, (*, #a+. () is the ,!

th

percentile which

means that ,!- of observations are at or below that value. (* is the .!

th

percentile

which means that .!- of observations are at or below that value. #edian is !/

th

percentile.

&requency histogram has values of quantitative variable on one a+is and frequency on

other a+is. Relative frequency histogram has values of quantitative variable on one

a+is and proportion or percent of observations on other a+is.

0umulative frequency histogram or ogive gives the percent or frequency of

observations at or below each value. 0umulative relative frequency histogram or

ogive displays percentiles on one a+is.

1utliers may be identified using

) ! . IQR

rule, or by using a modified bo+ plot on

calculator.

#ean and standard deviation are 213 resistant. #edian and quartiles are resistant.

4se median and 5(6 as measures of center and spread (respectively) if data is

strongly skewed or has outliers.

7raphs to display univariate, quantitative data' bo+plot, stemplot, histogram, dotplot.

(2ote' bo+ plot does not give information about individual observations.)

Chapter 2

Density curve has area of ) and is always on or above the +"a+is

8rea under curve in a certain range is the same as the proportion of observations in

that range. (4se area formulas for geometric density curves such as rectangles,

triangles, and trape%oids.)

Normal density curve is mound"shaped (or bell"shaped) mean9median area can be

found converting the observations to observation on the standard normal curve

statistic " parameter

std. dev. of statistic

z

=

then use Normalcdf(left bound, right bound) or table.

Empirical rule applies to all normal density curves.

4se 5nv2orm to find the standardi%ed observation associated with a certain percentile

ranking' InvNorm(%rank (as proportion)) = !statistic. 3hen use %"formula to

convert from the %"value to the observation value.

4se normal probability plot to determine if data can be modeled by a normal curve.

3he plot looks kind of like a scatterplot (last type of graph in the $383 :;13 menu)

and if plot looks linear, then data can be modeled by a normal density curve. 5f plot

shows definite curve, then data is skewed. 5f one observation is set apart from others

on left or right, then the observation is possibly an outlier.

Chapter "

<ivariate data' 3wo measures recorded on each individual.

4se a scatterplot to determine if there is a relationship between two quantitative

variables.

:ositive association means positive slope as values of e+planatory variable increase,

values of response variable increase. 1r, above average values of one variable are

associated with above average values of the other variable.

3o describe a plot give strength, form, and direction of relationship.

Correlation is a measure of the strength of a linear relationship. 8lso gives direction

(sign).

0orrelation coefficient has values' ) ) r where ) r = is a perfect line with

positive slope and ) r = is a perfect line with negative slope. Properties of

correlation are on

p. 132.

0orrelation of / r = or r close to %ero could mean no association at all (randomly

scattered points) or a non-linear association, such as a quadratic.

Least suares re!ression line means that the line produces the smallest sum of

squared residuals possible for the data.

;east squares line always passes through ( ) x, y

;east squares regression line can be obtained using ;in6eg on calculator or using

means and standard deviations of the two data sets. ($ee formula on formula sheet.)

$lope of the least squares lines tells the amount that the y"variable changes for each

unit of change in the +"variable.

Coefficient of determination,

,

r , is the percent of the variation in the response

variable that is e+plained by the model on the e+planatory variable.

"esidual is

or observed predicted y y

5f residual plot has no pattern, then that is evidence that the model selected is a good

fit for the data. (6esiduals are plotted against +"values or against y"values.)

Influential point sharply affects regression line if removed points that are e+treme in

the +"direction can be influential influential points may have small residuals.

(problems on p. )=> good to look at for practice?)

#utliers in a regression do not fit the pattern of the data generally have large

residuals

Correlation does not mean causation?? @ven if you have a perfect correlation, that

does not mean that the +"variables causes changes in the y"variable. 3he correlation

could be due to lurking variables.

Chapter #

3o transform e+ponential data so that itAs linear, plot ( ) ( ) or x,log y x,ln y

. ;inreg

equation will be of the form'

ln y a bx = +

3o transform data from power model, plot ( ) ( ) or log x,log y ln x,ln y

. ;inreg

equation will be of the form'

ln y a bln x = +

:redictions from e$trapolation, or predicting outside the range of the data, are not

reliable as pattern may not continue.

8 lur%in! varia&le is a variable that has an important effect on the relationship

between the variables in a study, but it is not included among the variables studied

6elationship between two variables can be due to' causation B + causes y (very hard

to demonstrateConly with controlled e+periment) common response B both + and y

variable are responding to some third variable confoundin! B the effect of the

e+planatory variable on the y variable is hopelessly mi+ed up with the effects of other

variables.

Simpson's Parado$ refers the reversal of the direction of a relationship when data

from several groups are combined to form a single group.

0ategorial data can be analy%ed using a two"way table.

7raphs for categorical data' bar graph, pie graph, segmented bar graph

3o see if there is a relationship between two categorical variables, compare

conditional probabilities.

Chapter $

8 census is when every individual in a population is used in a study a sample is

when only a portion of a population is used in a study.

8 study is &iased if one outcome is systematically favored over other outcomes.

Samplin! &ias may be due to' voluntary response, undercovera!e( convenience

samplin!, 6andom sampling reduces the chance of bias.

$ample si%e is 213 bias?? Yes, larger sample si%es are more accurate (less spread),

but sample si%e does not result in one outcome being systematically favored over

another.

Non)samplin! &ias cannot be corrected by random sampling. $ources of non"

sampling bias are' poor *ordin! of uestions, or nonresponse.

8 sample is a simple random sample +S"S, if every group of n individuals has an

equal chance of being selected.

3o create an $6$, number the individuals in the population from ) to whatever and

then using a random number table select individuals. #ust use the same number of

digits for each number, so made need to label individuals as /), /,, etc, or //), //,,

etc.

8 stratified sample (which is 213 an $6$) is one in which individuals are first

divided into separate groups, or strata, and then an $6$ is selected from each stratum.

8n o&servational study occurs when the e+perimenter observes individuals and

measures variables of interest but does not attempt to influence the responses.

5n an e$periment, the researcher deliberately imposes a treatment on individuals in

order to observe how they respond to the treatment.

3he basic principles of e$perimental desi!n are' control B control the effects of

lurking variables (done by comparison of groups like control group and treatment

group, blindness) randomi-ation B randomly assigning subDects to groups

replication B perform the e+periment on many subDects to reduce chance variation in

results. (%ood idea to read p& 2'$(()

Completely randomi-ed e$periment is one in which all e+perimental units are

assigned completely at random to groups. (3his is the opposite of a block design??)

.lind e+periment means that subDect does not know whether he is receiving the real

treatment or the individual interacting with the e+perimental unit does not know.

Dou&le &lind e+periment means that neither the subDect nor the people in contact with

the subDect know which treatment the subDect is receiving.

<lindness and double"blindness are used to control the placebo effect. 3he placebo

effect is the phenomenon that humans will always respond to a treatment.

.loc% desi!n is one in which e+perimental units that are similar are grouped together

and assigned to treatment groups within the block. 1nly subDects within a block are

compared. Ehen blocking, put ;5F@ 3H527$ together so that the variable being

controlled is constant within the block.

Blocking is used to reduce variation and to control lurking variables by grouping

subDects according to those lurking variables.

/atc0ed pairs is a special case of blocking in which each block consists of only two

e+perimental units, or one e+perimental unit receiving two treatments. 1rder of

treatments must be randomi%ed since there is not randomi%ation within groups.

@ither one subDect receives both treatments (in random order) or two subDects that are

alike in every important way are compared with one subDect randomly receiving one

treatment and the other randomly receiving a treatment.

Simulation can be used to represent results of a certain situation. 3o simulate a

situation, assign numbers to outcomes so that the number occurs with the same

frequency (percent) as the outcome. 4se a random number table (or calculator) to

find outcomes according to the numbers that come up.

Ehen designing a simulation' describe phenomenon to be simulated described how

numbers will be assigned to outcomes state stopping condition state assumptions

(independence, special cases such as repeated numbers) carry out many repetitions.

Chapter )

Pro&a&ility is the proportion of times that a certain event occurs in a long series of

repetitions.

3he La* of Lar!e Num&ers describes the phenomenon of the more trials we do, the

closer the ratio of occurrences to trials becomes closer to the true probability.

3he complement of an event is

) P event !

. 0omplement is denoted ( )

c

P event

1*o events are mutually e+clusive if they cannot occur simultaneously that is, the

Doint probability is %ero,

/ P " B! =

. (3wo events that are mutually e+clusive

cannot be independent.)

3wo events, 8 and <, are independent if

( ) ( G ) P " P " B =

. @vents that can occur

simultaneously may or may not be independent.

3he 2oint pro&a&ility of two (or more) events is the probability that both occur

simulataneously (82D). 5f two events are independent, then Doint probability is'

P " B! P "!P B! =

3he union or pro&a&ility t0at at least one event occurs is (on formula sheet )

P " B! P "! P B ! P " B ! = +

Conditional pro&a&ility of two events is the probability that one event occurs given

that the other event has already occurred (on formula sheet). (

G

P " B!

P "B !

P B!

=

:robability of at least one is

) P none !

Chapter '

A random variable is a variable whose value is a numerical outcome of a random

phenomenon.

8 discrete random varia&le is one in which there are a countable number of

outcomes. 3he distribution of a discrete random variable is a table (or histogram)

showing each possible outcome along with the probability of that outcome. 3o find

the probability of a discrete random variable, add the probabilities of all of the

outcomes in the range.

8 continuous random varia&le is one in which the variable takes on every possible

value in an interval. 3he distribution of a continuous random variable is a density

curve. 3o find the probability for a continuous random variable, find the area under

the density curve.

8 normal distribution is a special distribution of a continuous random variable.

3he mean of a random variable or e$pected value is (on formula sheet)

i i

# $ ! x p =

. (#ultiply each outcome by its probability and add them all up.)

3his gives the average outcome per game if the phenomenon were repeated many

times.

3he variance of a random variable is (on formula sheet)

,

i i

%"R $ ! x x ! p =

to get standard deviation, you square root the variance.

You can find the mean and standard deviation of a random variable on your calculator

by entering outcomes in ;) and probabilities in ;, and then doing )"Har $tats ;),;,.

$pecial rules'

a) 5f a constant is added to every number in a distribution, then the mean of the new

distribution is the old mean plus the constant and the standard deviation stays the

same. (8dding does not change the spread it Dust shifts distribution.)

b) 5f every number in a distribution is multiplied by a constant, then the mean of the

new distribution is the old mean times the constant and standard deviation of the

new distribution is the old standard deviation times the constant. (3he variance is

the square of the standard deviation so the variance is multiplied by the square of

the constant.)

c) 5f a new distribution is formed by adding or subtracting randomly selected

individuals from two e+isting distributions then the mean of the sums (or

differences) is the sum (or difference) of the means. 3hat is,

x y x y

+

= +

or

x y x y

=

3his calculation works regardless of whether the observations are independent.

d) 5f a new distribution is formed by adding or subtracting randomly selected

individuals from two e+isting distributions then the variance of the sums (or

differences) is the sum of the variances, provided that the observations are

independent. 3hat is,

, , ,

x y x y

+

= + . 3o the standard deviation, square root this result. 2ote that we

always 8DD variances.

Chapter *

.inomial distri&ution is a special probability distribution in which' there are two

outcomes for the event there is a fi+ed number of observations the observations are

independent probability of success is the same for each observation

3o find the probability of e+actly % success in n trials of a binomial phenomenon

either use the formula (on formula sheet)' 1

n n %

n %

P+ 3 % , C p + p ,

= = or use

<inompdf on your calculator' <inompdf(num observations, probability of success,

number of successes we want)

<inomial distribution is symmetrical if p 9 /.! skewed right if p close to %ero

skewed left if p close to ).

#ean of a binomial distribution is (on formula sheet)

=

3

np

and standard deviation

of binomial distribution is )

3

np+ p , = .

4eometric distri&ution is a special probability distribution in which' there are two

outcomes, success or fail, for the event the observations are independent the

probability of success is the same for each observation the variable of interest is the

number of trials before we see the first success.

3o find the probability of the first success on the kt& observation either use the

formula'

)

)

n

P+ 3 % , + p , p

3he mean of a geometric distribution is (213 on formula sheet)

1

3

p

=

Chapter +

;arger sample si%es are more accurate increasing sample si%e decreases sampling

variability.

3he samplin! distri&ution of a statistic is the distribution of the statistic in all

possible samples of a certain si%e.

3he distri&ution of t0e sample mean(

x

( has mean

=

$

(where

is the mean of

the population from with the sample is drawn) and standard deviation

$

n

=

.

3he Central Limit 10eorem gives the mean and standard deviation of the sampling

distribution of sample means as state above, and more importantly says that if the

sample si%e is large ( */ n ) the sampling distribution of sample means will be

appro+imately normal regardless of the shape of the distribution of the population. (5f

*/ n < , then the sampling distribution of sample means will mimic the shape of the

population, and will be more like it the smaller the sample si%e is. 3he sampling

distribution of sample means is normal if the distribution of the population is normal.)

3he sampling distribution of sample proportions,

p5

, has mean 5p

p =

(where p is

the population proportion) and standard deviation

1

5p

p+ p ,

n

= .

3he sampling distribution of sample proportions will be appro+imately normal if

1, and 1 1, np n+ p ,

Chapter 1,

8 confidence interval is used to estimate a population parameter.

An N6 confidence interval is interpreted as follo*s' 2- of all intervals that could

be obtained contain the population parameter, so weAre fairly confident that our

interval contains the population parameter.

5ncreasing the confidence level will increase the margin of error. Decreasing

confidence andIor increasing sample si%e will decrease the margin of error.

3he P)value of a statistic is the probability of obtaining a statistic as e+treme as the

one you got, if the null hypothesis is true.

8 statistic is si!nificant if it is unlikely to occur by random chance the statistic (or

sample) is unusual or rare. 3he smaller the :"value (closer to %ero) the more

significant the statistic and the stronger the evidence against the null hypothesis.

8 1ype I error is the probability that we incorrectly reDect the null hypothesis when it

is really true. 3he probability of a 3ype 5 error is

error will occur

8 1ype II error is the probability that we incorrectly accept the null hypothesis when

the alternate is really true. 3he probability of a 3ype 55 error is called

and is the

area under the JtrueK distribution that falls in the acceptance region of the

hypothesi%ed distribution.

3he po*er of a test is the probability that the test will reDect the null if the alternate is

really true. (

)

) power is the complement of 3ype 55 error.

5f sample si%e is increased, the probability of a 3ype 55 error decreases and power

increases. (:robability of 3ype 5 error is still alpha.)

5f significance level (alpha) is made smaller (from /./! to /./)) then probability of

3ype 55 error is increased and power decreased.

Chapter 11

8 t)distri&ution is used when we donAt know the population standard deviation and

we want to estimate a population mean. #ean of the distribution is always %ero.

8 t"distribution is bell"shaped, but is more variable (wider and flatter) than a standard

normal curve for small sample si%es. 5t is more variable because the standard

deviation is calculated from the sample so varies with each sample. 8s sample si%e

increases without bound, the t"distribution becomes closer to a normal distribution.

&or dependent samples (#atched :airs) do a one"sample t"test on the list of

differences. 3he null hypothesis would be

/

di''

=

&or two independent samples, use a two"sample t"test with null hypothesis

) ,

=

&or ,"sample t"test, the degrees of freedom is the smaller sample si%e minus ).

$@ is standard error. &or sample means, $@ is an estimate of the standard deviation

of the sampling distribution (

n

) and is

s

(#

n

=

.

$@ for the difference of two means is

, ,

) ,

) ,

s s

n n

+

Chapter 12

3he standard error of p"hat is

) p p !

n

interval to estimate the population proportion.

$ampling distribution of sample proportions is appro+imate normal if

)/ np

and

) )/ n p !

. &or a )"prop %"test, use the JpK from the null hypothesis. &or a

confidence interval, use p"hat.

4se a one"proportion %"test to compare an unknown population proportion to a known

population proportion. 3he standardi%ed statistic is' ( )

L

)

p p

z

p p

n

=

where p is the

known proportion stated in the null hypothesis (since we assume the null is true).

You may use

)":rop M test on calculator to test if a population proportion is equal to a given value

but you still must show all steps including the appropriate calculation for the %"

statistic.

3he $@ for a difference of two proportions is

) ) , ,

) ,

) ) p p ! p p !

n n

+

. 4se this $@

when estimating the difference between two population proportions. (Do 213 pool

the p"hats when calculating a confidence interval because we have no null that

assumes the proportions are equal.)

3he sampling distribution for the difference of two proportions is appro+imately

normal if

! n p !

and

) ! n p !

for <13H p"hats. 5f checking Jnearly normalK

for a two"proportion %"test, you may use the pooled

L p

in the place of both p"hats

when checking the nearly normal condition.

4se ,":rop M"test on calculator to see if two population proportions are equally where

you have two independent samples.

Chapter 1"

3he )&i-s*uare distribution is always skewed to the right. 8lso, the mean shifts to

the right as the degrees of freedom increases.

4se a 0hi"square 7oodness of &it test if you want to see if the distribution of a single

categorical variable JfitsK some ideali%ed distribution. 0reate the chi"sq statistic

using lists. 6emember to always use 01423$ not percents in your lists?

4se a 0hi"square 3est of independence if you want to see if two variables measured

on one set of individuals (sample or population) are independent"""mainly a two"way

table. 2ull hypothesis is that the variables are independent, or that there is no

relationship between the variables.

4se 0hi"square test of homogeneity if you want to see if the values of a categorical

variable are distributed the same way for two or more populations. 3he test works the

same way as a test of independence e+cept the null hypothesis is that the populations

are homogeneous.

@+pected counts for two"way table'

row total column total

grand total

Chapter 1#

3o test to see if a linear model is appropriate, we make an inference about the

parameter

2ull' 8ssume there is no linear relationship, so beta is %ero. Ho'

/ =

Degrees of freedom is n B , where n is the number of observations.

3o make a decision, look at the t"statistic and p"value from the computer output.

3o create a confidence interval for the population slope, use the t"statistic with n B ,

degrees of freedom and use the $@ from the computer output.

Hypothesis test decisions'

(uantitative variable(s)

4se )"sample t"test if comparing known population mean to unknown population

mean.

4se ,"sample t"test if comparing two unknown population means if the two

samples are independent.

4se )"sample t"test for matched pairs if you want to know if thereAs a difference

in two means given that the two samples are 213 independent"""matched

on some variable.

4se )"sample t"interval to estimate unknown population mean.

4se ,"sample t"interval to estimate the difference between two unknown

population means.

0ategorical variable(s)

4se )"proportion %"test if comparing a known population proportion to an

unknown population proportion.

4se ,"proportion %"test if comparing two unknown population proportions if the

samples are independent.

4se )"proportion %"interval to estimate an unknown population proportion.

4se ,"proportion %"interval to estimate the difference between two unknown

population proportions.

4se 0hi"square goodness of fit test if testing to see if the distribution of one

categorical variable for one population matches some given distribution.

4se 0hi"square test of homogeneity if you want to know if the distribution of one

categorical variable is the same for two or more independent samples.

4se 0hi"square test of independence if you want to determine if there is a

relationship between two categorical variables.

- 97em MasterUploaded byJoey Mclaughlin
- 321Uploaded byijsab.com
- Presentation 1Uploaded byDashrathsinh Rathod
- ECON1203 Hw Solution week11Uploaded byBad Boy
- Comparing ’Ex-Cathedra’ and IT-Supported Teaching Methods and Techniques Policy of Teaching PracticeUploaded byMightyTy
- Principles and Risks of Forecasting-Robert NauUploaded bylelouch
- BA 578 Live Fin Set2Uploaded bySumana Salauddin
- [BIOSTAT] Inference for Bivariates and Regression Coefficients LabUploaded byChelsea Rubio
- course workUploaded bySuniljpatil
- utf-8__session11Uploaded byqgocong
- Business StatsUploaded byJayashree Sikdar
- copy of outputstatsproject 1Uploaded byapi-385675448
- Correlational DesignUploaded byBinte Afzal
- Chapter 12Uploaded bypalak32
- 548-1724-1-PBUploaded byHaseeb Malik
- jurnal-matematika.pdfUploaded byAlaidinLays
- Assignment 3Uploaded bySaad Ullah Khan
- MH0038 Research MethodologyUploaded byRahul Gupta
- A statistical method for assessing network stability using the Chow testUploaded byGiuseppe Marsico
- Non Parametric TestsUploaded byKrislyn Ann Austria Alde
- BESFex11s1Uploaded byJason Pan
- PROBLEM SOLVING ABILITY AND ACADEMIC ACHIEVEMENT OF HIGHER SECONDARY STUDENTS.Uploaded byIJAR Journal
- 4.pdfUploaded byMunira Mustapher
- METHODOLOGY-Chapter-3-reaserch.docxUploaded byJoy Buar
- Common Mistakes Made by StudentsUploaded bykyogre995
- key note for mid.docxUploaded byThùyy Vy
- researchmethodslecture2015Uploaded byapi-300762638
- Differences and Similarities Between ParUploaded byPranay Pandey
- PBUploaded byatique1975
- Article 1019Uploaded byMasudur Rahman

- Development Ohmic Heating SystemUploaded byachmatsarifudin1317
- Fabric Form WorkUploaded bycaojin259
- KFD0-SD2-Ex1Uploaded byguardjaco
- ScribdUploaded byJayant Das
- Microwave BasicsUploaded bySheenly Joy Abalajen
- Discrete Cosine Transform .pdfUploaded byMohammad Rofii
- Anthropometry.pptUploaded byIvan Ng
- 0051-0052 [31] Volumetric ApparatusUploaded byNattawat Teerawattanapong
- 1 - The Link Between Perceptions of Negative Workplace Relationships and Organisational OutcomeUploaded bysangor2000
- an approach to early music singingUploaded byapi-244594631
- Corvette C3 Repair Manual EnglishUploaded byrdjones875
- UEEUploaded byPratap Reddy Chenikala
- Product CatalogueUploaded bySkazemi7
- Thermodynamic- week10Uploaded byDon Wook Won
- EME_1_SemQuestionsUploaded byKalyani Sethuraman
- Chap-09Uploaded byIshaani Garg
- 08 Trend 2005 Emcat Pro 2005 User ManualUploaded byPedro García
- FINAL Wireless Power Study-Final R1Uploaded byManu Bindass
- RF ModuleUploaded bySankula Siva Sankar
- CrystalXcelsius4.5TutorialsUploaded bymkothakota
- Views for Product Manual leap powerUploaded byNasredine Alain
- Probability NotesUploaded byGhadendra Bhandari
- Hitung Drop IV DripUploaded byPrasetyo Hendy Kurniawan
- 12.4 QuestionsUploaded byAstri Yulmarliani Sudarman
- Bibliografía sobre el silogismo categórico aristotélicoUploaded byHoracio Gianneschi
- regaNotes2Uploaded bytararichispon
- EmulsionsUploaded byVIJAY KUMAR TIRUKKACHI
- Training RBF Networks With Selective BackpropagationUploaded bytall1
- tmp56B3.tmpUploaded byFrontiers
- CAL200_Brochure.pdfUploaded byeng13