You are on page 1of 7

Revision Notes

Binomial Distribution Hypothesis Tester


0.25 0.20

P( X = r )

0.15

0.10

0.05

0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Accept H0

Reject H0

19 20 r

Bob Francis 2005

Revision Notes

Statistics 1

Topic
Classifying data

Data Presentation
Examples

References
MEI Stats 1
Pages 12 and 13 Categorical data Discrete data Continuous data Pages 22 to 23 Grouped data

Categorical: Political parties: Conservative, Labour, Liberal Democrats, Greens, etc. Categorical: non-numerical categories, e.g. favourite colours of 30 children, political party Discrete: Goals scored in consecutive games: voted for by 1000 electors. 3, 3, 0, 4, 2, 1, 2, 1, 0, 2, 3, 3, 4, 2, 1, 2, 2, 3, 1, 2 Discrete: numerical taking particular, often integral, values, e.g. number of goals, shoe size Continuous: Heights measured to nearest cm: Continuous: numerical values measured to a given 181, 178, 160, 182, 166, 169, 174, 159, 180, 177, accuracy, e.g. length, weight, time, speed. 177, 182, 173, 174, 161, 177, 185, 166, 166, 186.

Frequency Distributions
Categorical: frequencies of various non-numerical categories, e.g. % supporting political parties. Discrete: frequencies of discrete values, e.g. goals scored by one team in 20 consecutive games. Continuous: frequencies of continuous values in class intervals with associated boundaries, e.g. no. students with heights measured to nearest 5 cm. Grouped discrete data can be treated as if it were continuous, e.g. distribution of marks in a test.

Categorical: Conservative 23% , Labour 42%, Liberal Democrats 18%, Greens 7%, rest 10%

MEI Stats 1
Pages 17 to 19 Frequency distributions Pages 24 to 26 Grouped data

Displaying Frequency Distributions


Categorical: Use a bar chart (or pie chart) with heights (or angles) proportional to frequencies. Discrete: Use a vertical line chart with heights proportional to frequencies. Continuous: Use a histogram with equal or unequal class intervals; area of rectangle proportional to frequency; height of rectangle gives frequency density. Use a cumulative frequency curve to plot cumulative frequencies against upper class boundaries for continuous (or grouped discrete) data. Interpretation: median, IQR, percentiles.

MEI Stats 1
Pages 56 to 58 Bar charts and vertical line charts Pages 62 to 69 Histograms Pages 74 to 77 Cumulative frequency curves

Stem and leaf diagrams


Concise way of displaying discrete or continuous data (measured to a given accuracy) whilst retaining the original information. Data usually sorted in ascending order. Interpretation: 'Shape' of distribution; mode, median and quartiles.

MEI Stats 1
Pages 6 to 8 Stem and leaf diagrams

Box and whisker plots


Simple way of displaying median, inter-quartile range and range for discrete or continuous data. Interpretation: Comparison of two distributions; medians, IQRs and ranges.

MEI Stats 1
Pages 73 and 74 Box & whisker plots

Skewness
A frequency distribution for discrete or continuous data may exhibit symmetry, positive skew or negative skew, according to its 'shape'. The discrete frequency distribution example (goals scored by a football team) is roughly symmetrical. The stem and leaf example (distribution of marks) exhibits negative skewness. The distribution of lengths of telephone calls may well exhibit positive skewness, peaking well to the left of the mid-range.

MEI Stats 1
Pages 5 and 6 Shapes of distributions

Statistics 1

Revision Notes

Topic

Central Tendency and Dispersion


Examples
Raw Data
Heights measured to nearest cm: 159, 160, 161, 166, 166, 166, 169, 173, 173, 174, 177, 177, 177, 178, 180, 181, 182, 182, 185, 196. Modes = 166 and 177 (i.e. data set is bimodal) Midrange = (159 + 196) 2 = 177.5 Median = (174 + 177) 2 = 175.5 Mean: x = x = 3472 = 174.1 n 20 Range = 196 159 = 37 Lower quartile Q1 = 166 Upper quartile Q3 = 180.5 Inter-quartile range (IQR) = 180.5 166 = 14.5 Sum of squares: Sxx = x2 n x = 607886 20 174.12 = 1669.8 Root mean square deviation: rmsd = S xx = 1669.8 = 9.14 (3 s.f.)
n
20
2

References
MEI Stats 1
Pages 13 to 16 Central Tendency Pages 24 to 27 Grouped data

Central Tendency [averages]


Mode: most frequently occurring value in a data set; a frequency distribution with a single mode is unimodal; a frequency distribution with two distinct modes is bimodal. Midrange: (minimum + maximum) value 2 Median: middle value when data arranged in order; for even total frequency average the middle 2 values Mean: (total of all data values) (total frequency) x = x (raw data) or xf (frequency distribution) n f

Quartiles and Percentiles


Lower (Q1) and upper (Q3) quartiles: values way and way through the distribution. Percentile: The rth percentile is the value r/100 way through the distribution.

MEI Stats 1
Pages 71 and 72 Quartiles

Standard deviation: s =

S xx = 1669.8 = 9.37 (3 s.f.) 19 n 1

Dispersion [spread]
Range: maximum value minimum value Inter quartile range (IQR): (upper quartile lower quartile) = Q3 Q1 Sum of squares: Sxx = ( x x ) 2 x2 n x 2 (raw data)

Outliers (a): 174.1 2 9.37 = 155.36 or 192.84 - the value 196 lies beyond these limits, so one outlier Outliers (b): 166 1.5 14.5 = 144.25 180.5 + 1.5 14.5 = 202.25 - no values lie beyond these limits, so no outliers

MEI Stats 1
Pages 31 to 40 Range Sum of squares Root mean square deviation Standard deviation Page 73 Inter-Quartile Range

Frequency Distribution
Goals scored by one team in 20 consecutive games:
Goals scored (x) Frequency (f) 0 2 1 4 2 7 3 5 4 2

Sxx = ( x x ) 2 f x2f n x 2 (frequency dist.) S Mean square deviation: xx rmsd: S xx n n S Variance: xx Standard deviation: s= S xx n 1 n 1

Mode = 2 Midrange = (0 + 4) 2 = 2 Median = 2 (goals scored in 10th and l1th matches) Mean = x =

Using a calculator
Make sure that you can use a scientific or graphical calculator to find the mean [ x ], root mean square deviation, rmsd [n] and standard deviation, s [n-1] of a raw data set and a frequency distribution.

xf 41 = 2.05 = xf 20

Lower quartile Q1 = 1 Upper quartile Q3 = 3 Range = 4 0 = 4 Inter-quartile range (IQR) = 3 1 = 2 Sum of squares: Sxx = x2f n x = 109 20 2.052 = 24.95 Root mean square deviation: rmsd = S xx = 24.95 = 1.12 (3 s.f.)
n
20
2

Graphical calculator
Data Analysis for the TI-83 + accompanying notes

Outliers
Can be applied to data which are: (a) at least 2 standard deviations from the mean i.e. beyond x 2s (b) at least 1.5 IQR beyond the nearer quartile i.e. below Q1 1.5IQR or above Q3 + 1.5IQR

MEI Stats 1
Pages 40 and 41 Outliers & s.d. Pages 73 and 74 Outliers & IQR

Standard deviation: s = S xx = 24.95 = 1.15 (3 s.f.) n 1 19 Outliers (a): 2.05 2 1.15 = 0.25 or 4.35 - no values lie beyond these limits, so no outliers Outliers (b): 1 1 . 5 2 = 2; 3 + 1 . 5 2 = 6 - no values lie beyond these limits, so no outliers For data sets x and y : y = 5x 20: Given x = 24.8 and sx = 7.3:

Coding
If y = ax + b then: y=ax+b

MEI Stats 1
Pages 43 to 45 Linear coding

and

sy = a sx

y = 5 24.8 20 = 102 and sy = 5 7.3 = 36.5

Revision Notes

Statistics 1

Topic
Probability of events

Probability 1
Examples
Experimental Probability
In a statistical experiment a drawing pin is thrown 100 times, landing point-down 36 times. The probability of event A (the drawing pin landing point down) may be estimated as:
37 = 0.37 P(A) 100 P(A') = 1 P(A) = 1 0.37 = 0.63

References
MEI Stats 1
Pages 87 to 91 Measuring probability Experimental and theoretical probability The complement of an event Expectation (expected frequency)

Probability describes the likelihood of an event occurring in a statistical experiment. Probability is measured on a scale of 0 to 1:

Theoretical Probability The theoretical probability of an event A is given by P( A ) = n( A) where A is the set of n ( ) favourable outcomes and is the set of all possible outcomes. The experimental probability of an event is: number of successes number of trials The complementary event of A is given by A' and is defined as the set of possible outcomes not in set A. Hence P( A') = 1 P( A ) The expectation (expected frequency) of an event is the number of times it is expected to occur in n repetitions of the experiment, and is given by: Expected frequency = n P(A)
An ordinary pack of cards is shuffled and a card chosen at random. The probability of event A (card chosen is a picture card) is calculated by:
3 P(A) = 12 52 = 13 3 = 10 P(A') = 1 P(A) = 1 13 13 If the experiment is repeated 100 times, then the expectation (expected frequency) of a picture card being chosen

= n P(A) = 100 10 13 = 76.9 (to 3 s.f.)

Sample Space
Two fair dice are thrown and their scores added. 1 2 3 4 5 6 +
6 7 8 9 10 11 6 = Event A (Total = 7): P( A ) = 36 Event B (Total > 8): P(B) = 10 36 = 1 2 3 4 5 6 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 7 8 9 10 11 12 1 6 5 18

Sample space
The sample space for an experiment illustrates the set of all possible outcomes. An event is therefore a sub-set of the sample space. Probabilities can be calculated from first principles.

Non-mutually exclusive events


An ordinary pack of cards is shuffled and a card chosen at random. Event A (card chosen is a picture card): P(A) = 12 52 Event B (card chosen is a heart): P(B) = 13 52
3 : Since P(card is a picture heart) = P(A B) = 52

Addition rule for probability


For any two events A and B: P(A or B) = P(A) + P(B) P(A and B) or P(A B) = P(A) + P(B) P(A B) Events A and B are mutually exclusive if they cannot happen simultaneously, i.e. the occurrence of one event excludes the occurrence of the other event, => P(A B) = 0 Addition rule for mutually exclusive events: P(A or B) = P(A B) = P(A) + P(B)

MEI Stats 1
Pages 92 to 94 Probability of one event or another Pages 94 and 95 Mutually exclusive events

P(A B) = P(A) + P(B) P(A B)


13 3 = 12 52 + 52 52 22 = 11 = 52 26

Mutually exclusive events


Two fair dice and thrown and their scores are added.
6 = 1 Event A (Total = 7): P(A) = 36 6 5 Event B (Total > 8): P(B) = 10 36 = 18 Since P(Total = 7 and > 8) = P(A B) = 0:

Multiplication rule for probability


For any two (dependent) events A and B: P( A and B ) = P( A B) = P( A ) P( B | A ) Events A and B are independent if the occurrence of one has no effect on the occurrence of the other, P( B | A ) = P(B) and P( A | B ) = P(A) Multiplication rule for independent events: P( A and B ) = P( A B) = P( A ) P( B )

MEI Stats 1
Pages 109 to 110 Dependent and independent events

P(A B) = P(A) + P(B)


6 + 10 = 36 36 4 = 16 36 = 9

Statistics 1

Revision Notes

Topic
Tree diagrams

Probability 2
Examples
Independent events
A child's toy has two parts; 90% of top parts and 75% of bottom parts are perfect. Parts are placed together at random. Event A (top part is perfect): P(A) = 0.9 Event B (bottom part is perfect): P(B) = 0.75 P(A B) = P(A) P(B) = 0.9 0.75 = 0.675

References
MEI Stats 1
Pages 98 to 101 The probability of events from two trials

A useful way of illustrating probabilities for both independent and dependent events. Multiply probabilities along the branches (and); Add probabilities at the ends of branches (or).

Independent events:

Dependent events:

Dependent events
A pack of cards is shuffled; two cards are chosen at random without replacement. Event A (1st card is a picture card): P(A) = 12 52 Event B (2nd card is a picture card): P(B | A) = 11 51
11 11 P(A B) = P(A) P(B | A) = 12 52 51 = 221

Tree diagrams may have more than two branches at each division and/or more than two sets. Tree diagrams may be asymmetrical.

Conditional Probability
The multiplication law for dependent probabilities: may be rearranged to give: P(B | A) = P( A B ) or P(A | B) = P( A B )
P(A)
P(B )
40 12 3 11 P(B) = 12 52 51 + 52 51 = 13

MEI Stats 1
Pages 107 to 113 Conditional probability

If event A logically precedes event B then the right-hand version is useful for calculating posterior conditional probability.

The conditional probability of "A given B" is: P(A | B) = P( A B ) = P(B )


11 221 3 13

Combinations and Probability


The number of ways of arranging n distinct objects in order is n!, where n! = n (n 1) ... 3 2 1 [Special case: 0! = 1] The number of ways of choosing (or selecting) n r from n distinct objects is Cr, where
n

= 11 51

MEI Stats 1
Pages 139 to 140 Factorials and arrangements Pages 143 to 146 Combinations Binomial coefficients Pages 147 to 149 Calculating probabilities in less simple cases

Choosing a tiddlywinks team


A college Tiddlywinks Club has 17 members, 7 of whom are girls. A mixed team of 5 is chosen at random. No. possible outcomes =
17

n n! Cr = = , for r = 0, 1, 2, , n r r !( n r )!

C5 = 17! = 6188
5! 12!

Suppose that n distinct objects are divided into types S and T , where n(S) = n1 and n(T) = n2 and r objects are selected at random from the n objects. The probability that there are r1 of type S and r2 of type T is: n1 Cr1 n2 Cr2 where r1 + r2 = r and n1 + n2 = n n

No. ways of choosing a team with exactly two girls = C2


7 10

C3 =

7! 10! = 2520 2! 5! 3! 7!

Hence probability that the team, chosen at random, contains exactly two girls

Cr

= 2520 = 0.407 (to 3 s.f.)


6188

Revision Notes

Statistics 1

Topic

Discrete Random Variables


Examples
Definition by formula
X is a discrete random variable given by: P(X = r) = k for r = 1, 2, 3, 4 r To find the value of k, use P(X = xi) = 1: P(X = xi) = k + k + k + k = 1 1 2 3 4

References
MEI Stats 1
Pages 118 to 124 Definitions Notation Vertical line charts Calculation of probabilities

Discrete random variables


A discrete random variable X takes values: x1, x2, x3, x4, , xn with probabilities: p1, p2, p3, p4, , pn where pi = P(X = xi) for i = 1, 2, 3, , n and pi = P(X = xi) = 1 Illustrate using a vertical line chart:

25 k = 1 k = 12 = 0.48

12

25

Vertical line chart:

Definition by formula: Sometimes it is possible to define the probability function as a formula, as a function of r: P(X = r) = f(r) for values of r (usually integral) Often the function f includes a constant, k, which can be found using the property pi = 1 Definition by table: For a small set of values it is often convenient to list the associated probabilities pi for each xi x1 x2 x3 . xn 1 xn xi P(X = xi) p1 p2 p3 . pn 1 pn Calculation of probabilities: Sometimes you need to be able to calculate the probability of some compound event, given the values from the table or function. Explanation of probabilities: Often you need to explain how the probability P(X = xk), for some value of k, is derived from first principles.

Expectation and Variance: E(X) = = rP(X = r)

= 10.48 + 20.24 + 30.16 + 40.12 = 1.92

E(X2) = r2P(X = r) = 120.48 + 220.24 + 320.16 + 420.12 = 4.5

Var(X) = E(X2) [E(X)]2 = 4.5 1.922 = 0.8136

Definition by table
In a competition, you have to match 4 inventors with 4 inventions. Assume this is done at random. Let X represent the number of correct matchings. The distribution is given by the table: r P(X = r) 0 3 8 1 1 3 2 1 4 3 0 4 1 24

Expectation and Variance: E(X) = = rP(X = r)


3 + 1 1 + 2 1 + 30 + 4 1 = 1 = 0 8 3 4 24

E(X2) = r2P(X = r) 3 + 12 1 + 22 1 + 32 0 + 4 2 1 = 2 = 02 8 3 4 24 2 2 Var(X) = E(X ) [E(X)] = 2 1 = 1

Expectation (mean)
The expectation (or mean) of a discrete random variable is defined by: E(X) = = xiP(X = xi) = xipi

Calculation of probabilities:
If two friends both enter the competition, the probability that both guess the same number of correct matchings 3 2 + 1 2 + 1 2 + 0 2 + 1 2 = 91 0.316 (3 s.f.) = 8 288 3 4 24

MEI Stats 1
Pages 127 to 130 Expectation of a discrete random variable

Variance
The variance of a discrete random variable is defined by: Var(X) = = E([X ] ) x P(X = x)
2 2 2 2

Explanation of probabilities:
Explanation of why P(X = 2) = 1 4: Total number of possible matchings = 4! = 24 One correct matching found in 4C2 = 6 ways 6 = 1 P(X = 2) = 24 4

MEI Stats 1
Pages 127 to 130 Variance of a discrete random variable

Var(X) = 2 = E(X2) [E(X)]2

Statistics 1

Revision Notes

Topic

Binomial Distribution and Hypothesis Testing


Examples
Left and right-handed people In a national survey, 12% of people are left-handed. In a random sample of 15 people, let X represent the number of left-handed people. Then X ~ B(15, 0.12) and P( X = r ) = 15Cr 0.12 r 0.88 15r for r = 0, 1, ..., 15 P( X = 3) = 15C3 0.12 3 0.88 12 = 0.170 (3 s.f.) P(X 1) = 1 P(X = 0) = 1 0.8815 = 0.853 (3 s.f.) Mean (expected) no. left-handed = 15 0.12 = 1.8 If 50 random samples of 15 people are taken, then the expected frequency of finding 3 left-handed people = 50 P(X = 3) = 50 0.170 = 8.48 (to 3 s.f.)

References

Binomial Distribution B(n, p)


A trial is defined to be 'success' or 'failure', where P('success') = p and P('failure') = q [= l p]. A random sample consists of n independent trials. Let X represent the number of 'successes' in the random sample. Then X ~ B( n , p ) and n r P(X = r) = Cr p qn r, for r = 0, 1, 2, ..., n Mean (expected) number of 'successes' = np If m random samples of n independent trials are taken, then the expected frequency of r successes is given by m P(X = r).

MEI Stats 1
Pages 153 to 156 The binomial distribution Pages 158 to 161 Expectation of B(n, p) Using the binomial distribution

Cumulative Binomial Probability Tables


Binomial probabilities can be calculated using cumulative binomial probabilities, on pages 34 to 39 of the Students' Handbook. These tables give P(X x) for n = 1 to 20 and various values of p. The following is an extract for n = 20:

Throwing a fair die


A fair die is thrown 20 times. Let X represent the number of sixes obtained. Then X ~ B(20, 1/6) and cumulative binomial probability tables may be used:

Pages 174 to 175 Cumulative binomial probability tables

MEI Stats 1

P(X 5) = 0.8982
P(X = 4) = P(X 4) P(X 3) = 0.7687 0.5665 = 0.2022 P(X > 6) = 1 P(X 6) = 1 0.9629 = 0.0371 P(3 X 6) = P(X 6) P(X 2) = 0.9629 0.3287 = 0.6342

Hypothesis Testing
A null hypothesis (H0) is tested against an alternative hypothesis (H1) at a particular significance level. According to given criteria, the null hypothesis is either rejected is not rejected. An hypothesis test can be either 1-tailed or 2-tailed.

One tail test


Chris thinks that his die is biased against producing sixes. In 20 throws of the die he gets just 1 six. Hypothesis test to test Chris's claim at 5% level: (1) H0: p = 1/6; Hl: p < 1/6 (1-tail) (2) Decide on the significance level: 5% (3) Data collected: 1 six in 20 trials (4) Conduct test: P(X 1) = 0.1304 > 0.05 (5%) (5) Interpret result: Since P(X 1) > 5%, there is not enough evidence to reject H0, i.e. accept that Chris's die is not biased against sixes. Critical value and critical region: Since P(X 0) = 0.0261 < 0.05 (5%), X = 0 is the critical value and {0} is the critical region.

MEI Stats 1
Pages 169 to 173

Defining terms Hypothesis testing checklist Choosing the significance level


Pages 177 to 179

Hypothesis testing procedure


(1) Establish null and alternative hypotheses: H0 : p =... H1: p < ... or p > ... (1-tail); p ... (2-tail) (2) Decide on the significance level: s% (3) Collect data (independent and at random): obtain r successes out of n trials. (4) Conduct test: 1-tail: H1: p < ... - compare P(X r) with s% 1-tail: H1: p > ... - compare P(X r) with s% 2-tail: H1: p ... if r < mean (np) compare P(X r) with l/2 s% if r > mean (np) compare P(X r) with l/2 s% (5) Interpret result in terms of the original claim: 1-tail: if P(X r) or P(X r) < s% reject H0 2-tail: if P(X r) or P(X r) <l/2 s% reject H0

Critical values and critical regions


Pages 182 to 184

Critical value and critical region


The critical value is the least extreme value for which the null hypothesis (H0) is rejected. The critical region is the set of all values for which H0 is rejected.

Two tail test A survey claims: "15% of population left-handed". Hypothesis test to test survey's claim at 10% level: (1) H0: p = 0.15; H1: p 0.15 (2-tail) (2) Decide on the significance level: 10% (3) Data collected: 7 LH in random sample of 20 (4) Conduct test: since 7 > mean (20 0.15 = 3), P(X 7) = 1 P(X 6) = 1 0.9781 = 0.0219 (5) Interpret result: Since P(X 7) < 5%, there is enough evidence to reject H0, i.e. do not accept 15% of the population are left-handed. Critical region: Since P(X 0) = 0.0388 and P(X 1) = 0.1756, and P(X 6) = 0.0388 and P(X 7) = 0.0673, {X: x = 0 or x 7} is the critical region.

1-tail and 2-tail tests Asymmetrical cases

Excel Spreadsheet
Binomial Distribution, Hypothesis Testing and Critical Regions