Statistics Reference Cheatsheet PDF

.
TLI\
TSTICS FOR INTRODUGTORY COURSES
J STATISTICS - A set of tools for collecting,

oreanizing, presenting, and analyzing f MEAN -The ooint in a distribution of measurements O SUM OF SOUARES fSSr- Der iationstiom
numerical facts or observations. about which the summeddeviationsare equal to zero. andsummed:
the mean.squared
Average value of a sample or population. (I r,),
I . Descriptive Statistics - procedures used to PopulationSS:I(X - li. r x ) ' o r I x i ',- t
POPULATION MEAN SAMPLE MEAN N
organize and present data in a convenient,
_ r, \,)2
useable. and communicable form.
2. Inferential Statistics - procedures employed
p: +!,*, o:#2*, Sample SS:I(xi -x)2or Ixi2---
O VARIANCE - The averageof squarediffer-
to arrive at broader generalizations or
Note: The mean ls very sensltlveto extrememeasure- encesbetweenobservations andtheir mean.
inferences from sample data to populations.
mentsthat are not balancedon both sides.
-l STATISTIC - A number describing a sample I WEIGHTED MEAN - Sum of a setof observations
characteristic. Results from the manipulation multiplied by their respectiveweights, divided by the SAMPLEVARIANCE
POPULANONVARIANCE
of sample data according to certain specified sum of the weights:
procedures. 9, *, *,
-L-
WEIGHTED MEAN
J DATA - Characteristics or numbers that
a r e c o l l e c t e db y o b s e r v a t i o n . ,\r*'
w h e r ex r , : w e i g h t , ' x ,- o b s e r v a t i o nG ; : n u m b e ro f VARIANCESFOH GBOUPEDDATA
J POPULATION - A complete set of actual o b s e r v a i i o ng r d u p s . ' C a l c u l a t ef dr o m a p o p u l a t i o n .
or potential observations. sample.or gr6upings in a frequencydistribution.
- A number describing a Ex. In the FrequencVDistribution below, the meun is POPUIATION SAMPLE
J PARAMETER
population characteristic; typically, inferred 80.3: culculatbd by- using frequencies for the wis. ^{G-'{G
f r o m s a m p l es t a t i s t i c .
When grouped, use clossmidpointsJbr xis. o2:*i t , ( r , - p) t s 2 = ; 1 i t i l m ' - x ; 2
J MEDIAN - Observationor potenlialobservationin a
lI ;_r t=1
f SAMPLE - A subset of the population set that divides the set so that the same number of
selectedaccording to some scheme. observationslie on each side of it. For an odd number D STANDARD DEVIATION - Squareroot of
of values.it is the middle value; for an even number it the variance:
J RANDOM SAMPLE - A subset selected is the averageof the middle two.
in such a way that each member of the Ex. Pop. S.D. o-
Ex. In the Frequency Distribution table below, the
population has an equal opportunity to be
selected. Ex.lottery numbers in afair lottery
median is 79.5.
f MODE - Observationthat occurs with the greatest n
Y
J VARIABLE - A phenomenon that may take tiequency. Ex. In the Frequency Distributioln nble
on different values. below. the mode is 88. I
U
fi
GROUpITG D BAR GRAPH - A form of graph that uses
bars to indicate the frequency of occurrence z
Shows the number of times each observation
OF DATA of observations.
o Histogram - a form of bar graph used rr ith )
occurs when the values ofa variable are arranged interval or ratio-scaled variables.
in order according to their magnitudes. - Interval Scale- a quantitative scale that
permits the use of arithmetic operations. The
J il {il, I a rrI.)'A .l b]|, K I 3artl LQ zero point in the scale is arbitrary.
- R.atio Scale- same as interval scale excepl
BISTRI. that there is a true zero point.
x f x t x f x t tr CUMULATUEFREOUENCY
100 1 83 11 74 11f 65 o BUTION -A distributionwhich showstheto- D FREOUENCY CURVE - A form of graph
99 1 ut 11111 75 1111 66 1
tal frequencythrough the upper real limit of representing a frequency distribution in the form
98 0 85 1 76 11 67 11 of a continuous line that traces a histogram.
eachclass.
gl 0 86 o 77 111 68 1 o Cumulative Frequency Curve - a continuous
96 11 87 1 7A I 69 111 tr CUMUIATIVE PERCENTAGE DISTRI. line that traces a histogram where bars in all the
95 0 88 1111111 79 11 70 1111 BUTION-A distributionwhich showstheto- lower classes are stacked up in the adjacent
94 0 89 111 80 1 71 0 tal percentagethrough the upper real limit of higher class. It cannot have a negative slop.
93 I 11 81 11 72 11 eachclass. o Normal curve - bell-shaped curve.
92 0 91 1 82 I 73 111
o Skewed curve - departs from symmetry and
EilSTRIBUTION
FREOUENCY
II GROTJPED
I il {.ll lNl.l'tlz !I! llrfGl:
tails-off at one end.
- A frequency distribution in which the values

ofthe variable have been grouped into classes.
CLASS f
I Cum f " N O R M A LC U R V E
65-67 3 3 4.84 15
^/T\
6&70 8 11 17.74
./ \
71-73 5 16 25.81 10
-t
7+76 9 25 40.32
CLASS f CLASS t 0 -att?
\
Tt-79 6 31 50.00
98-100 80-82 4 35 56.45
83-85 8 43 69.35
86-88 8 51 82.26 CURVE
SKEWED
15
89-91 6 57 91.94 -- \
92-g 1 58 93.55
10
/ \
95-97 2 60 96.77 -/ LEFT \
9&100 2 62 100.00
0
J-
\
RANDOM VARIABLES
'onlv or function that assignsone and
A mapping
one-numerical value to each tr LEVEL OF SIGNIFICANCE-Aprobabilin
Probability of occurrence^t at -Number of outcomafamring EwntA outcome in an exPeriment. valueconsidered distribution.
rarein thesampling
Ant=@
oif'ent'l specifiedunderthenull hypothesiswhereoneis
D SAMPLE SPACE - All possibleoutcomesof an willing to acknowledge theoperationof chance
experiment. factors. Common significance levels are 170,
N TYPE OF EVENTS 5 0 , l 0 o . A l p h a ( a ) l e v e l : t h e l o w e s tl e v e
o Exhaustive - two or more events are said to be exhaustive for which the null hypothesis can be rejected.
if all possible outcomes are considered. The significanceleveldeterminesthecritical region.
-
Symbolically, P (A or B or...) l. [| NULL HYPOTHESIS (flr) - A statement
rNon-Exhausdve -two or more events are said to be non- tl DISCRETE RANDOM VARIABLES - In- that specifies hypothesized value(s) for one or
exhaustive if they do not exhaust all possible outcomes. volvesrulesor probabilitymodelsfor assign- more of the population parameter. lBx. Hs= a
rMutually Exclusive - Events that cannot occur ing or generatingonly distinctvalues(not frac- coin is unbiased.That isp : 0.5.]
simultaneously:p (A and B) = 0; and p (A or B) = p (A) + p (B).
tionalmeasurements). tr ALTERNATM HYPOTHESIS (.r/1) - A
Ex. males, females
oNon-Mutually Exclusive - Event-s that can occur C BINOMIAL DISTRIBUTION - A model statement that specifies that the population
simultaneously: p (A orB) = P(A) +p(B) - p(A and B)' for the sum of a seriesof n independenttrials parameter is some value other than the one
&x. males, brown eyes. wheretrial resultsin a 0 (failure) or I (suc- specified underthe null trypothesis.[Ex. I1r: a coin
- Events whose probability is unaffected is biased That isp * 0.5.1
Slndependent cess).Ex. Coin to p ( r ) = ( ! ) n ' l - t r l " - '
by occurrence or nonoccurrence of each other: p(A lB) = "t I. NONDIRECTIONAL HYPOTHESIS -
p(A); ptB In)= p(e); and p(A and B) = p(A) p(B). an alternative hypothesis (H1) that states onll
wherep(s) is the probabilityof s success in n
Ex. gender and eye color that the population parameter is different from
- Events whose probability changes trials with a constantn probability per trials,
SDependent
n! the one ipicified under H 6. Ex. [1 f lt + !t0
deoendlns upon the occurrence or non-occurrence ofeach ' -e' -r e(t s, 1/ \ =s,! ( n - s ) !
-a"n- dw" h
Two-Tailed Probability Value is employed when
other: p{.I I bl dilfers lrom AA): p(B lA) differs from
p ( B ) ; a n dp ( A a n dB ) : p ( A ) p ( B l A ) : p ( B ) A A I B ) Binomial mean: !: nx the alternative hypothesis is non-directional.
2. DIRECTIONAL HYPOTHESIS - an
Ex. rsce and eye colon Binomial variance: o': n, (l - tr)
alternative hypothesis that statesthe direction rn
C JOINT PROBABILITIES - Probabilitythat2 ot A s n i n c r e a s e s ,t h e B i n o m i a l a p p r o a c h e st h e which the population parameter differs fiom the
more eventsoccur simultaneously.
Normal distribution. one specified under 11* Ex. Ilt: Ir > pn r-trHf lr ' t1
tr MARGINAL PROBABILITIES or Uncondi- - One-TailedProbability Value is employedu'hen
tional Probabilities= summationof probabilities' D HYPERGEOMETRIC DISTRIBUTION
the alternative hypothesis is directional.
PROBABILITIES - Probability A model for the sum of a series of n trials where D NOTION OF INDIRECT PROOF - Stnct
D CONDITIONAL
of I given the existence of ,S, written, p (Al$. each trial results in a 0 or I and is drawn from a interpretation ofhypothesis testing reveals that thc'
small population with N elements split between null hypothesis can never be proved. [Ex. Ifwe toi.
fl EXAMPLE- Given the numbers I to 9 as
observations in a sample space: N1 successesand N2 failures. Then the probabil- a coin 200 times and tails comes up 100 times. it i s
.Events mutually exclusive and exhaustive' ity of splitting the n trials between xl successes no guarantee that heads will come up exactly hali
Example: p (all odd numb ers); p ( all eu-e
n nurnbers) and x2 failures is: Nl! the time in the long run; small discrepancies migfrt
.Evenls mutualty exclusive but not exhaustive- {_z!
exist. A bias can exist even at a small magnitude.
Example: p (an eien number); p (the numbers 7 and 5) p(xlandtrr:W We can make the assertion however that NO
.Events ni:ither mutually exclusive or exhaustive- 4tlv-r;lr
't BASIS EXISTS FOR REJECTING THE
Example: p (an even number or a 2) THE COIN IS
HYPOTHESIS THAT
Hypergeometric mean: pt :E(xi -
+ UNBIASED . (The null hypothesisis not reieued.
andvariance:o2: When employing the 0.05 level of significa
ffit+][p] reject the null hypothesis when a given res
D POISSON DISTRIBUTION - A model for occurs by chance 5% of the time or less.]
the number of occurrences of an event x : ] TWO TYPES OF ERRORS
- Type 1 Error (Typea Error) = the rejectionof
0 , 1 , 2 , . . . ,w h e n t h e p r o b a b i l i t y o f o c c u r r e n c e
11,whenit is actuallytrue.The probabilityof
is small, but the number of opportunities for
a type 1 error is givenby a.
t h e o c c u r r e n c ei s l a r g e , f o r x : 0 , 1 , 2 , 3 . . . .a n d -TypeII Error(TypeBError)=The acceptance
)v > 0 . otherwise P(x) =. 0. offl, whenit is actuallyfalse.Theprobabilin
e$t=ff of a type II error is given by B.
P o i s s o nm e a n a n d r a r i a n c e : , t .
Fo r c ontinuo u s t'a ri u b I es. .fi'eq u en t' i es u re e.tp re ssed

in terms o.f areus under u t'ttt.re.
D CONTINUOUS RANDOM VARIABLES

fl SAMPLING DISTRIBUTION - A theoretical - Variable that may take on any value along an
probability distribution of a statistic that would uninterrupted interval of a numberline.
iesult from drawing all possible samples of a D NORMAL DISTRIBUTION - bell cun'e;
(for sample mean X)
given size from some population.
a distribution whose values cluster symmetri- rlf x 1, X2, X3,... xn , is a simple random sample of n
cally around the mean (also median and mode). elements from a large (infinite) population, with mean
mu(p) and standard deviation o, then the distribution of
THE STAIUDARDEBROR f(x)=-1,
o"t'2x
(x-P)212o2 T takes on the bell shaped distribution of a normal
random variable as n increases andthe distribution ofthe
OF THE MEAN ratio: 7-!
wheref (x): frequency.at.a givenrzalue 6l^J n
A theoretical standard deviation of sample mean of a o : s t a n d a r dd e v i a t l o no f t h e approaches the standard normal distribution as n goes
given sample si4e, drawn from some speciJied popu- distribution to'infinity. In practice.a normal approximation is
lation. lt : approximately acceptable for samples of 30 or larger.
I 111q
DWhen based on a very large, known population, the approximately 2.7183 Percentage
s t a n d a r de r r o r i s : p : the meanof the distribution
" r__ _ o
6
x : any scorein the distribution Cumulative Distribution
under a normal curye
^ ln for selected Z values
D STANDARD NORMAL DISTRIBUTION

EWhen estimated from a sample drawn from very large - A normalrandomvariableZ. thathasa mean
population, the standard error is: of0. andstandard deviationof l.
S
O =^ =
t-
'fn Q Z-VALUES - The numberof standarddevia-
tionsa specificobservationliesfrom themean:
': x- 11
Z-value -3 -2 -l 0 +1 +2 +3
lThe dispersion of sample means decreasesas sample PercentifeScore o-13 2.2a 15.87 50.00 a4.13 97.72 99.a7
size is increased.
Critical region for rejection of Hs
when u : O-O7. two-tailed test
.2.b8 O +2.58
NBIASEDNESS- Propertyof a reliablees-

imator beins estimated. tr USED WHEN THE STANDARD DEVIA-
TION IS KNOWN: When o is known it is pos- Normal
CurveAreas
Table
A
o Unbiased Estimate of a Parameter - an estimate
that equals on the averagethe value ofthe parameter. sible to describethe form of the distribution of area tron
mean to z
the samplemeanasa Z statistic.The samplemust
Ex. the sample mesn is sn unbissed estimator of be drawn from a normal distribution or have a .00 .ol .o2 .(x! .o4 .o5 ,06 .o7 .6 .o9
the population mesn.
samplesize(n) of at least30.
. Biased Estimate of a Parameter - an estimate o.o o o o o . 0 0 4 0 . 0 0 8 0 . 0 1 2 0 0 1 6 0 0 1 9 9 . 0 2 3 9. 0 2 7 9 . 0 3 1 9. 0 3 5 9
that does not equal on the averagethe value ofthe =r

',.6 = -! whereu : populationmean(either o-l
o.2
0 3 9 8 0 4 3 8 . O 4 7 4. O 5 1 7 0 5 5 7 . 0 5 9 6 . 0 6 3 6 . 0 6 7 5 . 0 7 1 4 . 0 7 5 3
0 7 9 3 0 8 3 2 . 0 8 7 1 0 9 1 0 0948 .0987 .1026 .'tO64 .1'lO3 .1141
nro#rf or hypothesizedunder Ho) and or = o/f,.
parameter. o.3 1 1 7 9 1 2 1 7 1 2 5 5. 1 2 9 3
o.4 1554 1591 1624.1664
Ex. the sample variance calculated with n is a bi- o Critical Region - the portion of the areaunder
ssed estimator of the population variance, however,
x'hen calculated with n-I it is unbiused. the curvewhich includesthosevaluesof a statistic
that leadto the rejectionofthe null hypothesis.
J STANDARD ERROR - The standard deviation
of the estimator is called the standard error. - The most often used significance levels are
Er. The standard error forT's is. o: = "/F
0.01,0.05,and0.L Fora one-tailedtest using z-
X statistic,thesecorrespondto z-valuesof 2.33,
This has to be distinguished from the STAN- 1.65,and 1.28respectively. For a two-tailedtest, .4452 .4463 .4474 .4444 .4495 .4505 .4515 .4525 ,4535 .4545
D.A,RDDEVIATION OF THE SAMPLE: the critical regionof 0.01 is split into two equal .4554
.4641
.4564 .4573
.4649 .46S
.4542
.4664
.4591
.4671
.4599
.4674
.460a
.4646
.4616
.4693
.4625
.4699
.4633
.4706
outerareasmarkedby z-valuesof 12.581. .47't3
.4772
.4719 .4726
.4774 .4743
.4732
.4744
.4734
.4793
.4744
.4798
.4750
.4803
.4756
.4aAa
.4761
.4812
.4767
.4417
2.1 tetz aaa6
Example 1. Given a population with lt:250 2.2
.4821
.4A61
.4826.4830
.4A64 .4A68
.4834
.4871
.+aga
.4875 .4A78 .4aal
.taso
.4884
ZSsa
.4887
ABs?
.4A90
' The standard error measuresthe variability in the and o: S0,whatis theprobabili6t of drawing a 2.3 .4893 .4a96 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4
Ts around their expectedvalue E(X) while the stan- sample of n:100 values whosemean (x) is at 2.5
.4918
.4938
.4920 .4922
.4940 .4941
.4925
.4943
.4927
.4945
.4929
.4946
.4931
.4944
.4932
.4949
.4934
.4951
.4936
.4952
Jard deviation of the sample reflects the variability least255?In this case,2=1.00.Looking atThble .lgsg
.4965
.+955 .+ss6
..4966 ..4967
assz
.4968
a959
.4969
.+goo
.4970
.+s6r
.4971
.+s6z .csi6g
.4973
.4964-
.4974
,gYOC +YOO {Vot .4WO .9VqY .+VrV .+rt | .4972
,1Vt 4 .{VtJ .+Vr+ 1
l
rn the sample around the sample'smean (x). A, the given areafor 2:1.00 is 0.3413. Tb its .4974
.4974 .4975 .4976
.4975 .4976 .4977
.4977 .4977
.4977 .4978
.4978 .4979
.4979 .4979
.4979 .4980
.4980 .4S81 l
.4S81
right is 0.1587(=6.5-0.i413)or 15.85%. .4981 .49A2 .4982 .49a3 .4984 .4984 .4985 .4985 .4986 .4986 l
.4987 .4991 .4987 .4988 .4988 .4989 .4949 .4989 .4990 .4990
Conclusion: there are spproximately 16
chancesin 100 of obtaining a sample mean :
255from this papulation when n = 104. Example. Given x:l08, s:l5, and n-26 estimatea
- 95% confidence interval for the population mean.
Example 2. Assume we do not know the
population me&n. However, we suspect that Since the population variance is unknown, the t-dis-
I -r L'sBDwHEN THE STANDARDDEvIA- tribution is used.The resultinsinterval.usins a t-valve
I rtoN IS UNKNOWN-Useof Student'sr. it may have been selectedfrom a population of 2.060 fromTable B (row 25 of the middle-column),
f Wheno is notknown,itsvalueis estimated
from with 1t= 250 and 6= 50,but we are not sure. is approximately 102 to 114. Consequently,any hy-
F
f'
s a m o l ed a t a . The hypothesis to be tested is whether the pothesizedp between 102 to 114 is tenableon the
jm t-ratio- the ratio employedil thq.testingof sample mean was selectedfrom this popula- basis of this sample.Any hypothesizedprbelow 102
Ivpotheses or determiningthe significancebfa tian.Assume we obtainedfrom a sample (n) or above 114 would be rejectedat 0.05 significance.
Vrri'erence betweenmeafrsltwo--samplecase) of 100, a sample ,neen of 263. Is it reason- O COMPARISON BETWEEN I AND
inrolving a samplewith a i-distribuiion.The able to &ssante that this sample was drawn
tbrmulaTs: z DISTRIBUTIONS
from the suspectedpopulation?
Althoueh both distributions are svmmetrical about
\ F where p : population mean under H6 = 250 (that the actualmeanof the popu-
| . H o'.1t
S- a meanbf zero, the f-distribution is more spread out
X
lationfrom which the sampleis drawn is equal than the normal di stributi on (z-distributioh).
r = . sl r l to 250) Hi [t not equal to 250 (the alternative
"n6 hypothesiSis that it is greaterthan or less than
oDistribution-symmetrical distribution with a
mean of zero lnd standard deviation that 250, thus a two-tailed test).
annroachesone as degreesoffreedom increases 2. e-statisticwill be usedbecausethe popula-
' i . i . a p p r o a c h e st h e Z d i s t r i b u t i o n ) . tion o is known.
. A , s s u m p t i o na n d c o n d i t i o n r e q u i r e d i n
r\suming r-distribution: Samplesare diawn from
3. Assumethe significancelevel (cr)to be 0.01.
a norm-allv distributed population and o Looking at Table A, we find that the area be-
rpopulation standard deviatiori) is unknown. yond a z of 2.58 is approximately0.005. Thus a much larger value of t is required to mark off
o Homogeneity of Variance- If.2 samples are To reject H6atthe 0.01levelof significance,t}reab- the bounds of the critical region <if rejection.
b e r n c c o m o a r e d t h e a s s u m p t i o ni n u s i n g t - r a t i o solutevalue of the obtainedz must be equalto or As d/rincreases, differences between z- andt- dis-
r' th?t the variances of the populatioi's from greaterthanlz6.91lor 2.58.Herethevalueof z cor- tributions are reduced. Table A (z) may be used
* h e r e t h e s a m p l e sa r e d r a w n a r e e q u a l . respondingto samplemean: 263is 2.60. instead of Table B (r; when n>30. To Lse either
o E s t i m a t e d6 X - , - X ,( t h a t i s s x , - F r ) i s b a s e do n table when n<30,the sample must be drawn from
tr CONCLUSION-Sincethisobtainedz fallswithin a normal population.
thc unbiasedestimaie of the pofulaiion variance.
thecriticalregion,we mayrejectH oatthe0.01level
o Degrees of Freedom (dJ\-^thenumber of values of significance.
that are free to vary after placing certain
restrictions on the data.
Example. The sample (43,74,42,65) has n = 4. The
sum is 224 and mean : 56. Using these 4 numbers
Ta^bJs
'!4 B*=l.evel
o.o25 0.o1 0.oo5
and determining deviationsfrom the mean, we'll have
J deviations namely (-13,18,-14,9) which sum up to .1
1 .
o.o5
12.706
olo2
3r.tz1
o.ol
6i.657
4.3Q3 6.965 9-925
:ero. Deviations from the mean is one restriction we 3 3 . 18 2 4.541 5.441
have imposed and the natural consequence is that the tr CONFIDENCE INTERVAL- Interval within 4 2.776
2.571
1747
3.305
4.604
4-O3Z
sum ofthese deviations should equal zero. For this to which we may consider a hypothesis tenable. -6 2.447 3.143 _ 3.707 _
happen, we can choose any number but our freedom .Z 2.994 3.499
Common confidence intervals are 90oh,95oh, d 2.306 _ 2.896 3.355
to choose is limited to only 3 numbers because one is and 99oh. Confidence Limits: limits definins :e- ?.?42 2.82L _ 3.250
10 2-2?E 2.76L _ 3.169
restricted by the requirement that the sum ofthe de- . !t 2.201 2.714 3.106
the confidence interval. 2..11779 9 2..66AA1 1 3..00555 5
viations should equal zero. We use the equality: 2 2 3
g.ot z
,13 . z. r oo 2.650 .
(x, -x) + (x2-9 + ft t- x) + (xa--x): 0 (1- cr)100% confidence interval for rr: .14 2.145 _ 2.924 ?.9f7
2.'131 , 2.60? 2.947
I So given q mean of 56,iJ'thefirst 3 observqtionsure ii, *F t l-il. l,<i +z * ("1{n) .1

-
q
17
2:12o _
2,110
?,qe3
2.5Q7
2.921
?.e98
43, 74,und 42, the last observationhus to be 65. This _ 18 - 2.1o1 2.552 2.474
w h e r eZ - , i s t h e v a l u e o f t h e _] t 2.Q93_2.539- 2. 461
single restriction in this case helps us determine df, standard normal variable Z-that puts crl2 per- 20 2.OqQ - 2.524 2.445
Theformula is n lessnumber of restrictions. In this ,aa 2.OQO 2.514 _ 2.431
cent in each tail of the distribution. The confi- 2Bf4 2.5Q8 2.a1e
t'ase,it is n-l= 4-l=3df 2.o6s 2.5Oq _ 2-AA7
dence interval is the complement of the critical 24 ?.e64 _2.4e2 - ?.1e7
. _/-Ratiois a robust test- This meansthat statistical regions. .25 2.060 _ 2.1qF 2.7Q7
inferencesarelikely valid despitefairly largedepartures ,?s ?.o50 2.479 - 2.779
27 2.052 2.423 2.771
from normality in the population distribution. If nor- A t-statistic may be used in place of the z-statistic 2. 2.044 2.467 2.763
.29 2.Q45 ?,482 Z.lsE
mality of populationdistributionis in doubt,it is wise when o is unknown and s must be used as an - 30 2.042 2.457 2.750
to increasethe samplesize. estimate. (But note the caution in that section.) ini,
GORRELATION
D STANDARD ERROR OF THE DIFFER-
ENCE betweenMeansfor CorrelatedGroups.
N SAMPLING DISTRIBUTION OF THE The seneralformula is: Definirton - Carrelation refers to the relatianship baween
DIFFERENCE BETWEEN MEANS- If a num- two variables,The Correlstion CoefJicientis a measurethst
^2 ^2 n exFrcssesthe extent ta which two vsriables we related
ber of pairs of samples were taken from the same
population or from two different populations, then:
""r* " 7r- zrsrrsr,
r The distribution of differences between pairs of where r is Pearsoncorrelation
samplemeanstendsto be normal (z-distrjbution). Q *PEARSON r'METHOD (Product-Moment
o By matching samples on a variable correlated CorrelationCoefficient) - Cordlationcoefficient
r The mean of these differences between means with the criterion variable, the magnitude of the employedwith interval-or ratio-scaledvariables.
F1, 1"is equal to the difference between the standard error ofthe difference can be reduced.
Ex.: Givenobservations to twovariablesXandIl
population means. that is ltfl-tz. o The higher the correlation, the greater the we cancomputetheir corresponding u values:
reduction in the standard error ofthe difference. Z, :(x-R)/s" andZ, :{y-D/tv.
I Z-DISTRIBUTION: or and ozure known
'The formulas for the Pearsoncorrelation(r):
o The standard error ofthe difference between means
o", - ={toi) | \ + @',)I n2
: {* -;Xy -y)
",
o Where (u, - u,) reDresents the hvpothesizeddif- JSSI SSJ
ferencein rirdan!.'ihefollowins statisticcan be used - Use the above formula for large samples.
for hypothesis tests: - Use this formula (also knovsn asthe Mean-Devistion
_ (4-tr)-(ut- uz) D PURPOSE- Indicatespossibility of overall Makod of camputngthe Pearsonr) for small samples.
z= meaneffectof the experimentaltreatmentsbefore 2( z-2,,)
"r.r, investigatinga specific hypothesis. r:__iL
o When n1 and n2 qre >30, substifuesy and s2 for
Q RAW SCORE METHOD is quicker and can
ol and 02. respecnvely. D ANOVA- Consistsof obtaining independent be used in place of the first formula above when
estimatesfrom populationsubgroups.It allowsfor
(j;.#) the partition of the sum of squaresinto known
the sample values are available.
(IxXI/)
components of variation.
(To-qbtain sum of squaygs(SS) see Measures of Cen-
tral Tendencyon'page l) D TYPES OF VARIANCES
D POOLED '-TEST ' Between4roupVariance(BGV| reflectsthemag-
o Distribution is normal nihrdeof thedifference(s)amongthegroupmeans.
on<30 ' Within-Group Variance (WGV)- reflectsthe
r o1 and 02 are zal known but assumed equal dispersionwithin eachtreatmentgroup. It is also
referredto asthe error term. oMost widely-used non-parametric test.
- The hvoothesistest mav be 2 tailed (: vs. *) or I
talled:.1i.is1t, andrhe.alternativeis 1rl > lt2 @r 1t, 2 tI CALCULATING VARIANCES .The y2 mean : its degreesof freedom.
p 2 and the alternatrves y f p2.) ' Following the F-ratio, when the BGV is large oThe X2 variance : twice its degreesof fieedorl
- degreesof freedom(dfl : (n -2.
r-I)+(n y 1)=n fn 2 relativeto theWGV. the F-ratiowill alsobe larse. o Can be usedto test one or two independentsamples.
( ;U.r11!9 giyen formula below for estimating 6riz o2(8i'r,,')' o The square of a standardnormal variable is a
to determrnest,-x-,. 66y=
z - Determine the critical region for reiection by as-
sieningan acceptable level-ofsisnificdnceand[ook-
k-1
: mean of ith treatment group and xr.,
c h i - s q d a r ev a r i a b l e .
o Like the t-distibution" it has different distribu-
tions depending on the degrees of freedom.
tU in! at ihe r-tablb with df, nt + i2-2.
o_Use the following formula for the estimated stan-
of all n values across all k treatmeiii
.ss,+ss, *...+^ss* D DEGREES OF FREEDO|M (d.f.) .
v
/
I dard error:
(n1- l)sf +(n2- l)s nI+n2
WGV:
"_O
wherethe SS'sare the sumsof squares(seeMea-
COMPUTATION
o lf chi-pquare tests for the goodness-of'-fitto a hr -
0 n1*n2-2 n tn2
sures of Central Tendencyon page 1) of each
subgroup'svaluesaroundthe subgroupmean.
p o t h e s i z ' eddi s t r i b u t i o n .
d.f.: S - I - m, where
( D USING F.RATIO- F : BGV/WGV . g:.number of groups,or classes.in the fiequener
dlstrlbutlon.
1 Degreesof freedomare k-l for the numerator m - number of population - s a m n l eparametersthat lnu\r
D HETEROGENEITY OF VARIANCES mav and n-k for the denominator. b e e s t i m a t e df r o m s t a t i s t i c st o t e s t t h c
be determinedby using the F-test: ' If BGV > WGV, the experimentaltreatments h y p o t h e ssi .
o s2lllarger variance'1
are responsiblefor the large differencesamong o lf chi-squara testsfor homogeneityor contingener
stgmaller variance)
group means.Null hypothesis:the group means d.f : (rows-1) (columns-I)
D NULL HYPOTHESIS- Variancesare equal are estimatesof a commonpopulationmean. f, GOODNESS-OF-FIT TEST- To anolr tht--
and their ratio is one. c h i - s q u a r e d i s t r i b u t i o n i n t h i s m a n h ' e r -t .h . '
TIALTERNATM HYPOTHESIS-Variances c r i t i c d l c h i - s q u a r ev a l u e i s e x p r e s s e da s :
(f-_i) '
differ andtheir ratio is not one. , ,nh"..
f, Look at "Table C" below to determine if the : observedfreqd6ncy ofthe variable
/p
variancesare significantly different from each In random samplesof size n, the samplepropor-
/" - .lp..,tSd fiequency (basedon hypothesrzcd
other. Use degrees of freedom from the 2 tionp fluctuatesaroundthe proportionmean: rc p o p u l a t l o no l s t n D u t r o n) .
samples:(n1-1, nyI).
with a proportionvarianceof I#9 proportion t r T E S T S O F C O N T I N G E N C Y - A p p l i c a t i o nt ' l
Chi-squarete.ststo two separatepopulaiionsto te.r
standarderror of ,[Wdf; s t a t r s t l c arl n d e o e n d e n c o ef a t t r r b u t e s .
As the sampling distribution of p increases,it D T E S T S O F H O M O G E N E I T Y - A p p l i c a t i o no t
concentratesmore aroundits targetmean. It also Chi-square-tests to tWpqq.mplqs to_test'iTtheycanrc
Top row=.05,Bottom row=.01 f r o m f o p u l a t i o n s w i t h l i k e 'd i s t r i b u t i o n s .
points for distribution of F sets closerto the normal distribution.In which
iooo.
wqov'
P-ft D R U N S T E S T - T e s t s w h e t h e r a s e q u e n c e( t t r
of freedom for numerator z: ^Tn(|-ttyn c o m p r i s ea s a m p l e ) . i sr a n d o m .T h e ' f o l l o ui n g
e q u a t r o n sa r e a p p l r e d :
nt'n11
,rnR, = ' r1- ' I r , t. '.tls
r =, ,2r ", n l2ntnr(2ntn,
" r r J 1 " , * " r 1 21 n r + n , l 1
ISBN t 5?eeaEql-g Where
tlltt
ilnfift
,llilJllflill[[llllillll
7 = mean number of runs
n, : number of outcomes of one type
( n2 = number of outcomes of the other type
z quickstudy.com
S4 = standard deviation of the distribution of the
numDer oI runs.
U .5.$4.95 / C A N .$7. 50 Febr uar y2003
lllllilllilffillll[l
\Oll('EIOSTIDENT:ThisQ{/(/t chartcorersthebasrcsofIntroduer,,r'
t r s t i c s .D u e t o i 1 : c o n d c n \ c d l o r n i a t . h c t u e rc r . u s c i t r s r S t a t i s t i c sg r r i r l r , l r r dn o l a \ a r ( p l a ( r
nrrnt lor assigned course work.t l0(ll. tsar( hart\. lnc. Boca Raton. FI
C u s t o m e rH o t l i n e# 1 . 8 0 0 . 2 3 0 . 9 5 2 2
\ l r n u l l i e t u r e cb
l r I l n r i n a t r n gS e r rr c e s .I n c . l . a r g o .f l . i r n d s o l d u n d e r l i c e n s et f o r f I . r . 1
f l o l d r n g : . L L ( . . P r l o \ l t o . ( . \ l S . { u n d e r o n e o r n r o r e o l t h e t b l l o $ i n g P aL t e S nPt sr r: r r
5 . 0 6 - 1 . 6 1o- r 5 . l l . l . - 1 l l

Statistics Reference Cheatsheet PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Reference Cheatsheet PDF

Uploaded by

Copyright:

Available Formats

.

TSTICS FOR INTRODUGTORY COURSES

J STATISTICS - A set of tools for collecting,

- A frequency distribution in which the values

Fo r c ontinuo u s t'a ri u b I es. .fi'eq u en t' i es u re e.tp re ssed

D CONTINUOUS RANDOM VARIABLES

D STANDARD NORMAL DISTRIBUTION

NBIASEDNESS- Propertyof a reliablees-

that does not equal on the averagethe value ofthe =r

I So given q mean of 56,iJ'thefirst 3 observqtionsure ii, F t l-il. l,<i +z ("1{n) .1

U .5.$4.95 / C A N .$7. 50 Febr uar y2003

You might also like