© All Rights Reserved

1 views

© All Rights Reserved

- Normal Probability Distribution
- FHMM 1134 General Mathematics III Tutorial 3 2013 New
- Quantifying Risk PMI
- OQ Risk Science 1.0
- 4Sampling and Sampling Distributions
- Descriptive SPSS
- Uncertainty
- 21_2.pdf
- UT Dallas Syllabus for econ6311.001.09f taught by Daniel Griffith (dag054000)
- Cpk
- CHAPTER 2 PART 1 Sampling Distribution
- prob stats
- may 2018.pdf
- Syllabus
- ch05ppt__2018_11_03_14_22_45.ppt
- 37_1_dscrt_prob_distn
- tema 2 ENG
- Continuous Distributions
- 03 The Nature of Data.ppt
- 1questions for pain medication data set

You are on page 1of 105

Dr Niall Dodds

Ms Ewa Bieniecka

Semester 2, 2017/18

Contents

1 Data Analysis 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Populations and Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Data Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.1 Dot Plots, Stem and Leaf Displays . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.2 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Data Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5.1 Measures of Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5.2 Measures of Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Probability 24

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Discrete and Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 The Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.4 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Fundamentals of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.1.1 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.1.2 Expected Value, Variance, Median . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.1.3 Combining Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.1 Finding Areas under Other Normal Curves . . . . . . . . . . . . . . . . . . . . 71

4.2.2 Finding Values of z when Probabilities are Given . . . . . . . . . . . . . . . . . 73

4.2.3 Combining Normally Distributed Variables . . . . . . . . . . . . . . . . . . . . 76

1

5 Sampling 79

5.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.1.1 1–Tail and 2–Tail Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1.2 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Appendices 93

2

Chapter 1

Data Analysis

Week 1 Monday

1.1 Introduction

Why study statistics? In a broad sense, statistics can provide some input to answering questions such

as:

• What happens?

Typically we are deal with situations in which there is some degree of uncertainty or unpredictability.

Examples:

2. Analyse data.

3. Present data.

4. Interpret data.

The hope is that eventually we are able to give unambiguous answers to the questions of interest.

In order to understand the meaning of statistical tests and their results, it is important to grasp the

underlying mathematics. This is a main focus of the present module.

3

Example 1.1.1

Two dental cleansers, A and B, are tested on teeth specimens. The weight loss in mg over the

experiment is:

A 10.2 11.0 9.6 9.8 9.9 10.5 11.2 9.5 10.1 11.8

B 9.6 8.5 9.0 9.8 10.7 9.0 9.5 9.9

Can we say that one cleanser is definitely more abrasive (gives higher weight loss)? We can use

a box plot to illustrate the experimental results but the answer is not clear cut just by looking

at the plot.

12

11

weight loss in mg

10

9

8

A B

—————————————————

Example 1.1.2

A cautionary example. Things are not always as simple as they seem. One class of such

examples is known as Simpson’s Paradox. Examine the following data regarding surgery for gall

stone removal. Two methods are used, OS (standard surgery) and PN (keyhole surgery).

OS PN

small large all small large all

Successful operations 81 192 273 234 55 289

Total operations 87 263 350 270 80 350

OS 93 73 78

PN 87 69 83

all 88 72 80

The results are highly counterintuitive: while the OS method has higher success rate for both

small and large stones, it has a lower overall success rate.

Two effects combine to cause this:

4

1. The two groups (large/small) are very different in size for the two methods, and

2. Stone size has a large effect on the success rate.

This example illustrates a reason why we must proceed with great care in order to arrive at

accurate conclusions from statistical data.

—————————————————

An important distinction must be made between populations and samples.

Definition

A population is the whole collection of individuals in which were are interested (e.g. people, objects,

flowers). A sample is a subset of the population from which we collect information/data.

We frequently want to know information about an entire population (e.g. voting intentions of everyone

living in Scotland; age of all people in the world). When such information is available (e.g. in a census)

the only problem is then in presenting and interpreting the data in an appropriate way. However,

studying the full population is usually impractical (e.g. due to cost or time limitations) and so to

learn about the population we instead seek to get data from a sample of the population. We must

then try to make sure that the sample gives a good representation of the population in which we are

interested.

Examples

General election voting Voting intentions of all people on Voting intentions of a sample of

intention the electoral roll in the UK 1000 people chosen at random from

the UK population

Mass of stars in the Uni- All stars in the visible Universe Sample of 10,000 observed stars

verse

A fair dice? Results of an infinite number of rolls Results from a sample of 100 rolls of

of the dice in question the dice

How long a make of Lifetime of all lightbulbs ever made Lifetime of 50 test bulbs

lightbulb lasts

• What will data from the sample tell you about the population (which is what you are really

interested in)?

5

How a sample is selected from a population is important and many different objective sampling

methods exist (e.g. simple random, systematic, stratified, quota, cluster, panel). The methods allow

us to develop analytical tools to make inferences about the underlying population. Broad general rules

are include those that:

• Each member of the population should have the same (or a known) chance of selection (not

simply volunteers).

• Selection should have no element of subjective choice: use a ‘random’ mechanism, e.g. spin a

coin, roll a fair dice, draw cards from a deck, use a computer random number generator, etc.

(Note: here random does not mean the same as haphazard.)

• No outcomes should be favoured or disadvantaged (e.g. if you want to know about spending

habits of adults, carrying out a survey in a shopping centre would not give a good sample).

We will encounter various different types of data in this module. Different types of data are best dealt

with and displayed in different ways, as we will discuss during the following section.

• Qualitative or categorical data is non-numerical data, about some quality or attribute, e.g. gender,

colour of eyes, blood type, shape of box. The data may be

– Ordinal data: about quantities that have a natural ordering, a rank e.g. place in a race,

job title etc. (Note that rank tells us nothing about distance between ranked items even

though a number might be involved.)

– Nominal data: about quantities with no natural ordering e.g. favourite food, type of car.

• Quantitative or numerical data is associated with measurements and counts. The data may be

– Discrete, i.e. separate and distinct (e.g. number of goals in a hockey match). This is

frequency or count data on how many individuals or items fall into a given category (e.g.

number of students with grades A, B and C in a particular module).

– Continuous i.e. able to take on any value, often within some range (e.g. height of a person).

This is Metric data obtained from measurement, e.g. time to complete a race, weight of

sheep, speed of light, etc.

The purpose of displaying data in different ways is to try to provide some insight into certain charac-

teristics of the data. The most appropriate way to present or summarise data depends on the type of

data that have been collected.

• Qualitative/Categorical data: nominal and ordinal. Here summaries and displays are chiefly by

means of bar charts and pie charts.

6

• Quantitative/Numerical data: counts or measurements. The best way to display this type of

data depends on the size of the data set:

– Medium to large sets: frequency table; histogram (bar chart); cumulative frequency (or %)

plot.

We illustrate the method with an example. Consider the following data set listing the number of

second hand cars less than 5 years old that are for sale in n = 45 selected car showrooms.

20 47 55 39 32 36 85 17 62 44 64

105 57 76 48 18 31 71 50 33 29 73

48 27 24 117 86 17 32 64 12 50 6

29 20 51 51 161 73 13 25 45 37 68

26

Example 1.4.1

To construct a dot plot, we draw an axis which extends from the minimum value up to the

maximum value. Now work through the data, putting dots in the appropriate places above the

axis to represent data values. If a value is repeated, build up the dots.

0 50 100 150

number of cars

Note that the symbol you choose is arbitrary (you could use also filled or open circles, ?, or

another of your choice).

The dental cleansers example shows another use of dot plots, illustrating their use for more than

one data set.

7

For a stem and leaf plot the data is split into two parts, the first digit (or digits) which is

called the stem, and the last digit (or digits) which is called the leaf.

For our car data we choose the tens as the stem (e.g. for data in the 30s) and the final digit as

the leaf (e.g 7 for the number 37).

To construct a stem and leaf plot:

• Examine the data to identify the stems and leafs. Write the stems in a column.

• Go through the data one value at a time, writing the leaf for each data value beside its

corresponding stem.

• Rearrange the leaves for each stem into ascending order.

0 6

1 23778

2 00456799

3 1223679

4 45788

5 001157

6 2448

7 1336

8 56

9

10 5

11 7

12

13

14

15

16 1

—————————————————

There is a subtlety associated with constructing the stem and leaf diagram that regards the choice

of number of stems to take. (A similar issue is associated with intervals in a histogram, as discussed

below.)

The purpose of making such plots is to show at a glance how the data values are spread out or

distributed, e.g.

• Highly concentrated or thinly spread (e.g the cars in the showroom are concentrated between

about 10 and 50 but thinly spread above about 90).

8

– Number of peaks: unimodal (one peak), bimodal (two peaks), etc.

– Symmetric distribution, or asymmetric distribution with, for example, a long upper tail (or

positive skew).

– Typical or ‘average’ values (this will be discussed in more detail later in this section).

1.4.2 Histograms

Histograms are similar in concept to stem and leaf displays, but are more useful for large data sets

(or quantitative/numerical data). A histogram groups the data as for a stem and leaf diagram but

records only the frequency in each group rather than all the leaf digits. That is, it takes a sequence of

bins and counts the number of observations in each bin. A bar is drawn on each bin where the height

relates to the proportion of observations in each bin. The important point to note is:

So, if the bins are all of equal width then the height can be a simple frequency. Otherwise it must be

a frequency density. Sometimes the densities are scaled so that the overall area of all the bars is equal

to 1. A histogram may be drawn for both continuous and discrete data.

Note that bar charts are similar to histograms but appropriate for qualitative/categorical data (e.g.

number of students in Dundee coming from particular countries). For a bar chart the width of each

bar has no meaning.

Example 1.4.2

Consider the following data showing the weight at birth of 50 infants with the condition SIRDS

(severe idiopathic respiratory disease):

1.050 1.175 1.230 1.310 1.500

1.600 1.720 1.750 1.770 2.275

2.500 1.030 1.100 1.185 1.225

1.260 1.295 1.300 1.550 1.820

1.890 1.940 2.200 2.270 2.440

2.560 2.760 1.130 1.575 1.680

1.760 1.930 2.015 2.090 2.600

2.700 2.950 3.160 3.400 3.640

2.830 1.410 1.715 1.720 2.040

2.200 2.400 2.550 2.570 3.005

A common intermediate step is to construct a grouped frequency table. Start by grouping data

in, for example, 0.2kg intervals (or ‘cells’ or ‘bins’). For continuous data, such as this, one has to

decide on an endpoint convention, i.e. what to do about data that lies exactly between intervals.

9

For this table we have chosen to go up (i.e. to put an infant with weight 2.2000kg in the interval

2.2 − 2.4kg). For discrete data this endpoint problem does not exist.

Our grouped frequency table is:

1.0-1.2 6

1.2-1.4 6

1.4-1.6 4

1.6-1.8 8

1.8-2.0 4

2.0-2.2 3

2.2-2.4 4

2.4-2.6 6

2.6-2.8 3

2.8-3.0 2

3.0-3.2 2

3.2-3.4 0

3.4-3.6 1

3.6-3.8 1

We then draw bars, one for each interval identified, with height corresponding to the frequency

density. For the grouping of the table above our histogram is:

8

7

6

5

Frequency

4

3

2

1

0

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8

weight (kg)

If we make a different frequency grouping then we must adjust the histogram accordingly. For

example:

10

Birth weight (kg) Frequency

1.0-1.2 6

1.2-1.4 6

1.4-1.6 4

1.6-1.8 8

1.8-2.0 4

2.0-2.2 3

2.2-2.4 4

2.4-2.6 6

2.6-3.2 7

3.2-3.8 2

0.8

0.7

0.6

0.5

Density

0.4

0.3

0.2

0.1

0.0

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 3.2 3.8

weight(kg)

Note that this second case uses a density scale on the y-axis. How are the heights of the bars

calculated here? Each represents the proportion of occurrences. For weights 1.0-1.2kg there are

6 infants recorded in the total count of 50. That is, a 6/50 = 0.12 proportion. The width of the

strip is 0.2 so we need a bar of height z where 0.2z = 0.12 ⇒ z = 0.6, and similarly for the other

strips.

—————————————————

Histograms are most often constructed using a package such as R or Excel which can do most of the

work for you. It is possible to customise the plots in such packages, as briefly discussed above. One

important thing to consider is the size of intervals.

Choice of intervals

How many intervals should we choose?

• Not too many: many intervals gives many bars each with small width. Some intervals can be

empty and overall the histogram looks noisy/choppy.

• Not too few: too few intervals gives large bars each averaging out lots of data so the shape of

the distribution is lost.

11

Even with a choice of intervals that avoids these two extremes, any particular choice of interval size

and endpoint method will lead to slightly different histogram shapes. In general, small data sets are

more sensitive to these choices so one should bear the size in mind when looking at plots.

Example 1.4.3

As a further example, consider the following data giving the response time in minutes for n = 220

letters:

21.0 17.0 43.0 2.2 12.0 2.0 10.0 17.0 34.0 1.0 27.0 15.0

20.0 5.0 6.0 65.0 22.0 45.0 31.0 7.0 40.7 5.0 17.0 10.0

37.0 67.0 6.0 3.0 63.0 37.0 25.0 60.0 44.0 12.0 13.0 5.5

11.0 25.0 3.0 5.0 25.0 7.0 18.0 22.0 2.0 6.0 4.0 1.0

130.0 40.5 30.5 12.0 20.5 15.0 2.0 1.0 30.0 25.0 22.0 40.0

10.0 5.0 6.0 19.0 15.0 10.0 21.0 65.0 5.2 11.0 6.0 31.0

7.0 45.0 11.0 13.0 30.0 10.0 5.0 40.0 13.0 17.0 1.0 7.0

4.0 9.0 92.0 7.0 19.0 25.0 21.0 63.0 116.0 7.0 64.0 30.0

20.0 12.0 1.0 14.0 7.0 6.75 32.0 30.0 32.0 22.0 10.0 2.0

40.0 5.0 37.0 11.0 21.0 35.0 20.0 30.0 45.0 2.25 29.0 5.5

12.5 39.0 8.0 95.0 1.0 1.0 14.0 8.0 26.0 45.0 14.0 17.0

16.0 36.0 21.0 169.0 7.0 15.0 11.0 11.0 42.0 1.0 17.0 70.0

5.0 20.0 1.0 6.0 4.0 28.0 7.0 25.0 20.0 21.0 12.4 9.0

22.0 5.0 4.0 0.3 0.1 2.8 4.6 5.9 2.5 2.4 1.0 17.0

52.0 27.0 30.0 10.25 1.0 1.0 3.0 2.0 2.0 12.0 107.0 2.0

3.0 17.0 3.0 2.0 11.0 41.0 5.5 32.0 17.0 17.0 11.0 17.0

75.0 15.0 10.5 25.0 28.0 49.0 8.0 4.5 6.4 70.5 0.25 6.0

12.0 5.0 38.0 6.0 5.0 2.0 11.0 22.0 82.0 35.0 37.0 140.0

41.0 38.0 30.0 1.0

80

60

Frequency

40

20

0

time (minutes)

12

Here we have a very asymmetric distribution with a long upper tail. The peak in the distribution

is in the first interval, 0 − 10 minutes.

From the plot it is hard to say what a typical value for the response time is. This leads us to

consider data summaries in the next section. A data summary for this letter time data is given

by:

220 21.94 25.11 0.1 5.7 14.0 30.0 169

—————————————————

Diagrams such as histograms are useful summaries of data sets. Often it is also possible or helpful to

give more simplified summaries of the data, talking about its location (including centre value) and its

spread. Various methods to summarise these features have been developed and we discuss these here.

For location, we mostly focus on giving an average or typical level or value, two measures being the

median M and the mean x̄. We can also use lower and upper quartiles (overall dividing the data into

4 chunks).

To classify the amount of spread (or variation) in data we can use the interquartile range (distance

between the chunks in the location measure) or the standard deviation, s.

The median, M

By definition, the median splits the ordered data set into two groups of equal size, so

• 50% of values ≥ M

• 50% of values ≤ M

(i.e. M is the 50% point). If n denotes the number of entries in the data set then the median M is

given by:

13

• n even: halfway between the (n/2)th and the (n/2 + 1)th (ordered) data values.

Example 1.5.1

• Data set B: n = 8, M = 9.5

—————————————————

Example 1.5.2

Car showroom data. We have n = 45. The 23rd value in order (either ascending or descending)

will correspond to the Median, M . In fact, M = 45 (read off stem and leaf).

—————————————————

Quartiles are useful for assessing symmetry. The quartiles are the lower quartile, QL , the median, M ,

and the upper quartile, QU . They split the (ordered) data into four groups of equal size:

75% of data values ≥ QL

75% of data values ≤ QU

One can also make generalisations of quartiles to quantiles if specified levels are given (e.g. the 1st

7-quantile is the point where 1/7 of the data is below that value and 6/7 above; the 4th 7-quantile

has 4/7 below and 3/7 above).

14

For discrete distributions, there is no universal agreement on selecting the quartile values. In this

course, for simplicity, we will pick one of the conventions. What other reasonable conventions can you

think of? Which conventions can you find in Excel?

To find quartiles:

• If n divisible by 4:

QL : halfway between the (n/4)th and the (n/4 + 1)th values in ascending order,

QU : halfway between the (n/4)th and the (n/4 + 1)th values in descending order.

• If n is not divisible by 4:

Let k be the integer part of n/4 (i.e. if n/4 = 19 then k = 19 or if n/4 = 16.7 then k = 16 etc.).

Then define:

QL : the (k + 1)th value in ascending order,

QU : the (k + 1)th value in descending order.

Example 1.5.3

1. First we consider a case where n is divisible by 4.

If the data set was {1, 6, 5, 10, 19, 54, 78, 105, 1, 3, 55, 78, 45, 56, 33, 1} then we could sort it as

{1, 1, 1, 3, 5, 6, 10, 19, 33, 45, 54, 55, 56, 78, 78, 105}. There are n = 16 pieces of data so the lower

quartile is halfway between the 4th and 5th entry, i.e. has value QL = 4. The upper quartile is

halfway between the 12th and 13th entry, i.e. QU = 55.5

If the data set was {44, 65, 66, 68, 73, 75, 78, 79, 81, 84} then, since the data is already sorted, we

note that there are 10 entries. Since 10/4 = 2.5 has integer part 2 we take the 3rd entry as the

lower quartile, i.e. QL = 66 and the 3rd from last (the 8th) entry as QU = 79.

—————————————————

The five number summary is made up of the quartiles along with the two extreme values, EL , EU ,

which can be used to give a brief, but informative numerical summary of the data.

A box plot is a graphical device for displaying the 5 number summary. This is a box about the lower

and upper quartiles with a line marking the median. Then whiskers (straight lines) extend to the

minimum and maximum values. The plot provides an indication of symmetry in the data.

Definition 1.5.1 (Outlier) Sometimes the whiskers are shortened so that they are have maximum

length of 1.5 times the box length (i.e. 1.5 times the interquartile range). The remaining data is then

15

marked with points to distinguish it as an outlier. [So, the interquartile range, as discussed below, is

QU − QL . You find 1.5 times this value, call that z, say. Then you mark outliers as points that are

greater than QU + z or less than QL − z. The whisker ends at the points that are not outliers.]

Example 1.5.4

Car showroom data: n = 45 ⇒ k = 11

The 12th value in ascending order gives QL = 27.

The 12th value in descending order gives QU = 64.

Also, EL = 6 and EU = 161, giving as our five number summary {6, 27, 45, 64, 161}

Box plot:

Or:

0 50 100 150

number of cars

—————————————————

Week 1 Tuesday

The mean, x̄

n

sum of values x1 + x2 + · · · + xn 1X

x̄ = = = xi .

number of values n n

i=1

This is the quantity that is often referred to as the ‘average’ in everyday life.

The mean, x̄, is the balance point for the set of data values:

16

(Imagine data points as point masses and imagine that the axis can rotate. Then x̄ is located at the

point where there is exact balance: i.e. no rotation.)

Example 1.5.5

A 10.2 11.0 9.6 9.8 9.9 10.5 11.2 9.5 10.1 11.8 Total = 103.6

B 9.6 8.5 9.0 9.8 10.7 9.0 9.5 9.9 Total = 76.0

To find the mean, x̄, we have to divide the total by the number of data points (in each case):

P

n i xi x̄ M

A 10 103.6 10.36 10.15

B 8 76.0 9.5 9.55

Here the mean can be compared with the median, M . They are not the same as each other

(although similar in this example). We discuss the difference between the two in the next section.

—————————————————

For data in a frequency table, with fj values in the cell with midpoint value yj , j = 1, . . . k , the mean

is given by

k

1X

fj yj (1.1)

n

j=1

i.e. multiply the number of entries in each class/interval by the midpoint value in the class (this will

give a different answer to the mean of the original data since some of the original information is lost

by creating the frequency table).

17

Mean vs. Median:

The mean and median are different measures of the centre of data and so, in general, will give different

values. Sometimes the values can be very different.

• When data are symmetric the mean and the median are roughly the same, M ≈ x̄. The approxi-

mation gets better the more symmetric the data is.

• When there is a long upper tail, which we call positive skewness, then M < x̄:

• When there is a long lower tail, which we call negative skewness, then M > x̄.

An property to remember about the mean is that it is not resistant to outliers (i.e. values which do

not seem to fit in with the others), whereas the median is. We illustrate this with an example.

18

Example 1.5.6

Consider the following weights in pounds of the crew in the 1992 Cambridge vx. Oxford boat

race. Each boat contains 8 rowers and 1 cox.

188.5 183 194.5 185 214 203.5 186 178.5 109 186 184.5 204 184.5 195.5 202.5 174 183

109.5

The entries 109 and 109.5 are the cox weights.

8

6

Frequency

4

2

0

weight in lb

The data has two outliers, skewing the data to the left. Including these outliers then the mean

weight is x̄ = 181.4 and the median M = 185.5. The mean is lowered by the outliers.

If the weight of rowers alone is considered then x̄ = 190.4 while M = 186.0. The median is still

very similar – it is not very affected by outliers.

—————————————————

If the measure of location is to give the value of a typical individual, the median would be preferred.

However, if an estimate of the population total is needed, you would use N x̄, where N is the population

size.

Example 1.5.7

Salary distribution e.g. in a large company

• But to get an idea of the total wage bill we would rather know x̄, so that the total wage

bill would be N x̄

—————————————————

19

1.5.2 Measures of Spread

As well as information on the average location of data, it is often useful to know about the way the

data spreads out about its average. Several measures of this spread have been developed which give

an indication of the degree of variation within a data set.

The range: EU − EL

Problems: depends on extremes (i.e. very volatile); depends on number of samples n (for given situa-

tion, the more cases you have, the more likely you are to get an outlier); ignores centre and spread of

data.

Hence only used in special circumstances, e.g. small n.

A simple concept, used when the median M is used as a measure of location. It gives the distance

over which the central 50% of the data values is spread.

Used when the mean x̄ is used as a measure of location. It gives a measure of the average distance

of data values from x̄. We can use knowledge of the mean to construct a measure of spread from the

mean called the standard deviation.

(If you are finding the standard deviation of a data set this is usually the sample standard deviation

– as distinct from the standard deviation of the population (usually denoted σ). Indeed for a data

set where you calculate the mean, you are (usually) calculating the sample mean. The mean of the

overall population may be different. We will discuss these differences later in the course.)

Let di = xi − x̄ be the deviation of the observation xi from the mean x̄ (for i = 1, 2, . . . n), i.e. the

difference of each observation from the mean value. This difference may be positive or negative

depending on whether the observation is greater or less than the mean. The magnitude (absolute

value) of the difference di is denoted |di | and is the distance of the observation xi from the mean x̄.

X X

(xi − x̄) = di = 0

i i

(see tutorial sheet; this is really the definition of the mean). Because these deviations sum to zero we

can’t use their sum as a measure of spread. However, it’s clear that the sizes of all the di terms tell

us something about spread.

Consider instead d2i = (xi − x̄)2 , the squared deviations (or, equivalently, the squared distances). This

sum will be non-zero and so we can use it as a measure of spread. Then the average (mean) of these

sums of squares is known as the (sample) variance:

Pn

2 i (xi − x̄)2

s =

n−1

20

The square root of the variance is known as the standard deviation:

sP

n

− x̄)2

i (xi

s=

n−1

which is used more often as it has units of x, whereas s2 has units of x2 . (Compare it with root mean

square.)

*Non-examinable material*

Two questions naturally arise when looking at this definition:

1. Why square di then take the square root? An alternative measure used is the (mean) average

Pn

deviation: i |di |/(n − 1). It is used much less often and has less theory developed for its use,

but may be better in some circumstances (particularly when extreme outlying values are present

in the data).

2. Why n − 1 in the denominator and not n? The natural choice for the denominator would seem

to be the total number of pieces of data, n.

• In fact, if we want to know the spread of data for a whole population then we use the

population standard deviation, r Pn

i (xi − x̄)2

σ=

n

In practice, however, usually we are dealing with a small sample of data from the total

population (and trying to infer things about the whole population from that – inferential

statistics. It has been shown that when dealing with a small- or medium-sized sample, the

above formula usually under-estimates the spread of data of the whole population. So the

correction of replacing n by n − 1 is applied and can be shown to give a better estimate in

most cases. There is a very complicated formula which can be used, but in practice this

simple adjustment is usually used instead.

• Deviations around the mean sum to zero. Hence we have total of (n−1) pieces of information

or degrees of freedom.

Calculations

Some useful facts to know when calculating s:

n n

!

X X

(xi − x̄)2 = xi 2

− nx̄2 (1.2)

i i

k

1 X

s2 = fj (yj − ȳ)2 (1.3)

n−1

j=1

21

i.e. act as though all entries in cell j take value yj (the midpoint value). Alternatively

k

X k

X

fj (yj − ȳ)2 = fj yj2 − nȳ 2 (1.4)

j=1 j=1

Example 1.5.8

Calculate the variance and standard deviation for the dental cleanser example, cleanser B.

For this data set there are n = 8 observations and the mean is x̄ = 9.5.

Data (loss in mg) 9.6 8.5 9.0 9.8 10.7 9.0 9.5 9.9

Deviations di = (xi − x̄) 0.1 -1.0 -0.5 0.3 1.2 -0.5 0.0 0.4

8

X 8

X

2

di = (xi − x̄)2 = 0.01 + 1 + 0.25 + 0.09 + 1.44 + 0.25 + 0.0 + 0.16 = 3.20

i=1 i

Variance: P8

2 − x̄)2

i=1 (xi 3.20

s = = = 0.4571

n−1 8−1

Standard deviation:

√

s= 0.4571 = 0.68

—————————————————

What does the (sample) standard deviation s tell us about the spread of the data – how far numbers

in a data set are away from their average?

Typically, most entries will be somewhere around one standard deviation from the mean. Not many

entries will be more than two or three standard deviations away from the mean.

• 60% – 75% of data values lie within one standard deviation of the mean x̄ ± s.

• > 95% of data values lie within two standard deviations of the mean x̄ ± 2s.

• almost all data values lie within three standard deviations of the mean x̄ ± 3s.

For data that is normally distributed we can be more precise – this will be discussed later in the

module.

22

We also consider an example for calculating the standard deviation from data in a grouped frequency

table:

Example 1.5.9

Using the data from the birth weight frequency table below, calculate s.

1.0-1.4 12

1.4-1.8 12

1.8-2.2 7

2.2-2.6 10

2.6-3.2 7

3.2-3.8 2

The mean is

P6

j=1 fj yj 1.2 × 12 + 1.6 × 12 + 2.0 × 7 + 2.4 × 10 + 2.9 × 7 + 3.5 × 2 98.9

x̄ = = = = 1.978.

n 50 50

We calculate

6

X

fj yj2 = 12 × (1.2)2 + 12 × (1.6)2 + 7 × (2)2 + 10 × (2.4)2 + 7 × (2.9)2 + 2 × (3.5)2 = 216.97

j=1

k

1 X 1

s2 = fj yj2 − nȳ 2 = 216.972 − 50 × 1.9782 = 0.4356

n−1 49

j=1

√

s= 0.4356 = 0.66kg.

This is an approximation using the frequency table grouping. Since we also have the full data

on birth weights we can calculate the (sample) mean and standard deviation.

Appropriate calculations would give us x̄ = 1.97kg and s = 0.66kg. These are well approximated

by the frequency grouping.

—————————————————

23

Chapter 2

Probability

Week 2 Monday

2.1 Introduction

Probability is given by a number between 0 and 1. It gives a measure of how likely it is for an event

to occur.

Example 2.1.1

Roll a die.

• Probability of 1 is 16 .

• Probability of even number is 21 .

• Probability of 7 is 0.

• Probability of ‘1,2,3,4,5 or 6’ is 1.

—————————————————

Notation 2.1.1 We use letters A, B, C, . . . for events and write P(A), P(B), P(C), . . . for the probabil-

ities of these events occurring. We use P(A), P(B), P(C), . . . for the probabilities of events A, B, C, . . .

not occurring. These are examples of complementary events.

The events A and A are called complementary since exactly one of either A or A must occur. Hence:

Notation 2.1.2 Let A and B be any events. Then we denote event of A and B both happening as:

A and B = A ∩ B

Example 2.1.2

The students who are studying Law or Engineering at a particular university are classified by

gender to give the following data:

24

Female Male TOTAL

Law 468 520 988

Engineering 674 1152 1826

TOTAL 1142 1672 2814

If a student is selected at random from all the Law or Engineering students, what is the proba-

bility that the student selected is

(i) an engineering student?

(ii) female?

(iii) a female engineering student?

(iv) not a female engineering student?

(v) either female or a male engineering student?

(vi) either female or an engineering student?

————————————————————

Let E denote the event ‘engineering student selected’. Let F denote the event ‘female student

selected’.

1826

(i) P(E) = 2814 ≈ 0.649

1142

(ii) P(F ) = 2814 ≈ 0.406

674

(iii) P(E ∩ F ) = [P(E and F ) =] 2814 ≈ 0.240

674

(iv) P(E ∩ F ) = 1 − 2814 ≈ 0.760

TBC

—————————————————

Example 2.1.3

5 − 10 1 40 − 45 100

10 − 15 3 45 − 50 38

15 − 20 6 50 − 55 16

20 − 25 9 55 − 60 5

25 − 30 42 60 − 65 2

30 − 35 107 65 − 70 1

35 − 40 170

What is the probability that a randomly selected truck has a turnaround time between 15 and

60 minutes?

—————————————

Denote the event A for time t such that 15 ≤ t ≤ 60 min. Then:

25

1+3+2+1

P(A) = ≈ 0.014

500

P(A) ≈ 1 − 0.014 = 0.986

—————————————————

Example 2.1.2 (continued)

If F occurs then (F ∩ E) cannot occur. Events F and (F ∩ E) are mutually exclusive. But if F occurs

then E can also occur, i.e. F and E are not mutually exclusive.

———————————-

Venn Diagram

Parts of a rectangle correspond to events. All possible events and combinations of events are shown.

The two examples below are equivalent:

26

Venn diagram can be useful to find formulas connecting different probabilities. Denote the probability

of the event ‘A or B occurring’ as P(A ∪ B). If A and B are mutually exclusive, then the probability

P(A ∪ B) is given by P(A) + P(B). Note that in this case P(A ∩ B) = 0. If A and B are not mutually

exclusive, then P(A ∪ B), and is given by P(A) + P(B) − P(A ∩ B).

In general, we have:

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

1142+1152

(v) P(F ∪ (F ∩ E)) = P(F ) + P(F ∩ E) = 2814 ≈ 0.815

1142+1826−674

(vi) P(F ∪ E) = P(E) + P(F ) − P(F ∩ E) = 2814 ≈ 0.815

De Morgan’s Laws

The event complementary to ‘A or B’ (A ∪ B) is ‘neither A nor B’. This is the same as ‘not A and

not B’, which can be denoted as A ∩ B.

Hence:

A∪B =A∩B

1 − P(A ∪ B) = P(A ∩ B)

Thus, as events A and B were arbitrary, for all events C and D we have:

C ∩D =C ∪D

1 − P(C ∩ D) = P(C ∪ D)

27

These are called De Morgan’s Laws.

Example 2.1.4

Faults in the production of bolts are classified as either faulty heads or faulty threads.

15% of all bolts produced have faulty heads.

20% of all bolts produced have faulty threads.

10% of all bolts produced have both types of fault.

If a bolt is selected at random, what is the probability that it has

(i) at least 1 fault?

(ii) no fault?

(iii) a good thread but a faulty head?

——————————————–

Let H denote ‘faulty head’. Let T denote ‘faulty thread’. Then:

P(H) = 0.15

P(T ) = 0.2

P(H ∩ T ) = 0.1

(ii) P(H ∩ T ) = 1 − P(H ∪ T ) = 1 − 0.25 = 0.75

(iii) Consider the following Venn diagram:

Therefore:

P(H ∩ T ) = P(H) − P(H ∩ T ) = 0.15 − 0.1 = 0.05

—————————————————

Week 2 Tuesday

28

Counting (Permutations)

In this section we will use factorial expressions, e.g. ‘five factorial’ 5! = 5 × 4 × 3 × 2 × 1, ‘six factorial’

6! = 6 × 5 × 4 × 3 × 2 × 1, etc.

Example 2.1.5

How many ways are there to arrange four letters A, B, C and D e.g. ABDC, ACDB, etc.?

—————————————–

For the first position, there are 4 different letters to choose from. For each of the letters chosen

for the position 1, there are 3 letters available for the position 2. This gives 4 × 3 combinations

for the first two positions. Then, there are 2 possibilities for position 3, and only one for position

4. This gives total of 4! = 4 × 3 × 2 × 1 = 24 different arrangements.

—————————————————

Example 2.1.6

—————————————————

Example 2.1.7

—————————

There are 5! ways of arranging the letters if we made a distinction between all of them. This

would however include 3! of arranging A0 s and 2! ways of arranging B 0 s, which in practice give

us the same arrangements. Hence, the actual number of distinct strings is given by:

5! 5 5

= = = 5 C2

3! × 2! 3 2

—————————————————

n

We call an expression k a binomial coefficient or simply n choose k. It describes a number of

combinations (number of unordered sets) of size k, chosen from a set of size n of distinct objects. We

can think of the last example as choosing 3 positions out of 5 for the letters A.

In general:

n n!

=

k k! × (n − k)!

Example 2.1.8

How many ways are there to choose 3 letters from a set of 5 distinct letters?

——————————–

29

We can first think of all possible arrangements of five letters, e.g. ABCDE, ACEBD (5!), where

the first three letters are the ones we want to choose. As in our final combination the order of

the letters will not matter, we should not count permutations of chosen (3!) and non–chosen

numbers (2!). Therefore we need to divide the initial number by 3! × 2!. Hence:

5! 5×4×3×2×1

= = 10

3! × 2! 3×2×1×2×1

—————————————————

Further Examples

Example 2.1.9

According to a recent University of Dundee Prospectus, there were 11,951 students registered

at the university in the preceding year. A breakdown of these students is given below where

they are categorised by both study level (undergraduate (u/g) or postgraduate (p/g)) and study

status (full time (FT), part time (PT) or distance learning (DL)).

FT PT DL total

u/g 7415 971 400 8786

p/g 655 1136 1374 3165

total 8070 2107 1774 11951

If you select one of these students at random, what are the chances you get

(i) a p/g student, (ii) a PT student, (iii) a PT p/g student, (iv) a PT student or a

p/g student?

(v) Suppose now you select a p/g student at random; what now are the chances that you select

a PT student?

————————————-

3165

(i) P(p/g) = 11951 ≈ 0.26

2107

(ii) P(P T ) = 11951 ≈ 0.18

1136

(iii) P(p/g ∩ P T ) = 11951 ≈ 0.095

3165+971

(iv) P(p/g ∪ P T ) = 11951 ≈ 0.35

1136

(v) P(P T | p/g) = 3165 ≈ 0.36

—————————————————

The notation used in (v) is called ‘conditional probability of PT given p/g’. The expression P(P T | p/g)

gives us probability that a P T student is selected from the group of p/g students. We have:

P(P T ∩ p/g)

P(P T | p/g) =

P(p/g)

30

Example 2.1.10

You roll three dice. What are the chances that at least two of the scores are the same or are

consecutive (e.g. (1,1,1), (1,2,1), (4,2,5), (1,5,4), (5,6,2) or (2,3,2))?

—————————-

Let us call our desirable event A. The complementary event A can be described as: all numbers

are different and no two numbers are consecutive. All possible combinations (unordered) are:

{1, 3, 5}, {1, 3, 6}, {1, 4, 6}, {2, 4, 6}. Number from each of these sets can be arranged in order in

3! = 6 different ways. This means that there are 6 × 4 = 24 permutations in A. Total number

of outcomes of 3 dice rolls is 6 × 6 × 6 = 216. Therefore:

24 24

P(A) = =⇒ P(A) = 1 − P(A) = 1 − ≈ 0.89

216 216

—————————————————

Example 2.1.11

Cars sold in the UK in 2011 and 2012 were classified by colour as shown in the following table.

(Numbers of cars in each category are given in thousands)

2011 2012

Red 212 183

Blue 443 467

Yellow 19 20

Green 135 122

Black 270 467

White 77 81

Silver 481 589

Other 288 101

TOTAL 1925 2030

(a) What is the probability that a car randomly selected from all the 2011 and 2012 cars is

silver?

(b) What is the probability that a car randomly selected from all the 2012 cars is silver?

(c) If a car is randomly selected from all the 2011 and 2012 cars, what is the probability that it

is a 2012 car?

(d) If a car that is randomly selected from all the 2011 and 2012 cars is silver, what is the

probability that it is a 2012 car?

————————————–

S: a silver car is chosen

2011: a car from the 2011 cars is chosen

2012: a car from the 2012 cars is chosen

31

481+589

(a) P(S | 2011 ∪ 2012) = 1925+2030 ≈ 0.27

589

(b) P(S | 2012) = 2030 ≈ 0.29

2030

(c) P(2012 | 2011 ∪ 2012) = 1925+2030 ≈ 0.51

589

(c) P(2012 |S) = 481+589 ≈ 0.55

—————————————————

Example 2.1.12

In a university class of mathematics students, it is found that 38% studied physics at school and

45% studied German at school and that 26% studied neither.

(a) What percentage of the students studied at least one of physics and German?

(b) What is the probability that a randomly selected student studied German, but not physics?

—————————-

P : students who studied physics

G: students who studied German

We have:

P(P ) = 0.38

P(G) = 0.45

P(P ∩ G) = 0.26

Answer: The percentage of students who studied at least one of physics and German is 74%.

(b) P(G ∩ P ) = P(G) − P(G ∩ P ) = P(G ∪ P ) − P(P ) = 0.74 − 0.38 = 0.36 [see Venn diagram].

P P TOTAL

G 0.36 0.45

G 0.26

TOTAL 0.38 0.62 1

Notice that using a table like the one above greatly simplifies calculations.

Answer: The probability that a randomly selected student studied German but not physics is

0.36.

—————————————————

Week 3 Monday

32

2.2 Conditional Probability

Lecture Example 2.1.9 (v) is an example of conditional probability. The condition is that a student is a

p/g. Let A represent PT student and let B represent p/g student. The question asked for probability

that A is true, given that B is true. We write this as P(A | B), and read as ‘A given B’. In the

solution we used the general result:

P(A ∩ B)

P(A | B) =

P(B)

Example 2.2.1

A poker player is observed to place high bets 25% of the time, but only has a good hand and

bets high 8% of the time. Given that the player has placed a high bet, what is the probability

that they have a good hand?

——————————

Let us denote the events as follows:

A: good hand

B: high bet

Then:

P(B) = 0.25

P(A ∩ B) = 0.08

P(A | B) =?

We calculate:

P(A ∩ B) 0.08

P(A | B) = = = 0.32

P(B) 0.25

Answer: The probability that the player has a good hand given that they placed high bet is

0.32.

—————————————————

Example 2.2.2

In a survey of recent stock market share prices it has been noted that 45% of shares which fell in

price last quarter, rose this quarter. Also, 32% of all shares fell last quarter. What percentage

of stocks, fell last quarter and then rose this quarter?

——————————-

Let us denote the events as follows:

A: fell in price last quarter

B: rose in price this quarter

33

Then:

P(B | A) = 0.45

P(A) = 0.32

P(A ∩ B) =?

We calculate:

Answer: Approximately 14% of shares fell last quarter and then rose this quarter.

—————————————————

Example 2.2.3

A test for a chemical correctly detects that the chemical is present only 98% of the time, The

test never gives a false positive result. A series of tests gives the result that that the chemical

is present in 65% of tissue samples. Estimate the amount of tissue samples that contain the

chemical.

——————————

Let us denote the events as follows:

A: chemical present

B: test positive

Then:

P(B | A) = 0.98

P(B | A) = 0

P(B) = 0.65

P(A) =?

We calculate:

P(B ∩ A)

0 = P(B | A) = =⇒ P(B ∩ A) = 0

P(A)

0.65 = P(B) = P(B ∩ A) + P(B ∩ A) = P(B ∩ A)

P(B ∩ A) P(B ∩ A) 0.65

P(B | A) = =⇒ P(A) = = ≈ 0.66

P(A) P(B | A) 0.98

—————————————————

34

Bayes’ Theorem

Note that:

P(A ∩ B)

P(A | B) = =⇒ P(A ∩ B) = P(A | B) × P(B) (2.1)

P(B)

Also:

P(B ∩ A)

P(B | A) = (2.2)

P(A)

Substituting (2.1) into (2.2) gives Bayes’ Theorem:

P(A | B) × P(B)

P(B | A) =

P(A)

Example 2.2.4

It rains on 3 out of 10 days. Forecasters predict rain for the following day half of the time. Given

that forecasters are correct for 85% of days when it does rain, and are correct for 65% of days

when it does not, calculate the probability that:

(a) it will rain given that the forecaster has said it will,

(b) it will rain given that the forecaster has said it will not.

———————————–

Let us denote the events as follows:

R: rain

F : forecast rain

Then:

P(R) = 0.3

P(F ) = 0.5

P(F | R) = 0.85

P(F | R) = 0.65

a) P(R | F ) = = = 0.51

P(F ) 0.5

P(F | R) × P(R) (1 − P(F | R)) × P(R) 0.15 × 0.3

b) P(R | F ) = = = = 0.09

P(F ) 1 − P(F ) 0.5

Answer: a) There is 0.51 probability that it will rain if the forecaster said it would, and b)

there is 0.09 probability that it will rain if the forecaster said it would not.

—————————————————

Example 2.2.5

35

A test for a medical condition that 1 in 20 people have correctly diagnoses the condition in 4/5

of those with it. The probability that an individual who gets a positive result actually has the

condition is 0.3. What is the probability that the test gives a positive result?

——————————–

Let us denote the events as follows:

C: condition present

F : test result positive

Then:

1

P(C) = 20

4

P(F | C) = 5

P(C | F ) = 0.3

P(F ) =?

We calculate:

P(F ) = = ≈ 0.13

P(C | F ) 0.3

Answer: The probability that the test gives a positive result is approximately 0.13.

—————————————————

Week 3 Tuesday

Events A and B are independent if there is the same chance of A occurring, whether or not B occurs,

i.e.

Recall that

Example 2.3.1

Then P(Ahead ∩ Bhead ) = 0.5 × 0.5 = 0.25=P(Ahead ) × P(Bhead ).

Hence the events A and B are independent.

36

—————————————————

Example 2.3.2

There is a queue of cars at a set of traffic lights. Given that 10% of all cars are red, what is the

probability that

(a) The first car is red?

(b) The first and second cars are both red?

(c) Exactly one of the first 2 cars is red?

(d) Neither of the first 2 cars are red?

(e) At least one of the first 10 cars is red?

—————————–

(a) P(R1 ) = 0.1

(b) P(R1 ∩ R2 ) = 0.1 × 0.1 = 0.01

(c) P((R1 ∩ R2 ) ∪ (R1 ∩ R2 )) =[as these event are mutually exclusive]

= P(R1 ∩ R2 ) + P(R1 ∩ R2 ) =[as these events are independent]

P(R1 ) × P(R2 ) + P(R1 ) × P(R2 ) = 0.1 × 0.9 + 0.9 × 0.1 = 0.18

(d) P((R1 ∩ R2 ) = 0.9 × 0.9 = 0.81

(e) P(at least oen red car out of 10) = 1 − P(no red cars out of 10) = 1 − 0.910 ≈ 1 − 0.349 =

0.651

—————————————————

A random variable which takes only finite/countable number of values is referred to as discrete. A

random variable which can take any value in an interval of R is called continuous.

Example 2.4.1

DISCRETE:

• Cars passing through the traffic light. How many cars pass before first red car? Possible

values: {1, 2, 3, . . .}.

CONTINUOUS:

• Growing bacteria. How long before the colony reaches 1cm in diameter? Possible values:

t ≥ 0.

—————————————————

Example 2.4.2

37

The probability that an electronic component fails within t hours is given by

t −t

P (t) = 1 − 1 + exp , t > 0.

100 100

(a) Calculate the probability that the lifetime of a randomly selected component is less than

100 hours.

(b) Calculate the probability that the lifetime of a randomly selected component is greater than

200 hours.

(c) Three components are selected at random. Calculate the probability that at least 1 compo-

nent lasts more than 100 hours.

———————————–

(a) P(t < 100) = 1 − (1 + 100

100 ) × exp( −100

100 ) = 1 −

2

e ≈ 1 − 0.7358 = 0.2642

(b) P(200 < t) = 1 − P(t < 200) = (1 + 200

100 ) × exp( −200

100 ) =

3

e2

≈ 0.406

(c) P(at least 1 out of 3 lasts > 100) = 1−P(all 3 components last < 100) = 1−(P(t < 100))3 ≈

0.9815

—————————————————

Example 2.4.3

Patients referred to an orthopaedic clinic suffering from chronic lower back pain may be suffering

from condition A or condition B. Condition A is cured by painkillers and rest whereas condition

B is cured by painkillers and physiotherapy. A and B are quite difficult to distinguish from

X-rays, and consequently only 70% of patients who have A are correctly diagnosed as having A,

whilst 80% of patients who have B are correctly diagnosed as having B. Of those referred to the

clinic, 40% actually have A and the remaining 60% actually have B.

(a) What proportion of all the patients are given the correct treatment?

(b) What proportion of those being given physiotherapy should have been recommended to rest

instead?

(c) Are the events of having condition A and being diagnosed with condition A independent?

—————————————-

Let us denote the events as follows:

A: have condition A

D: diagnosed with A

A: have condition B

D: diagnosed with B

Then:

P(D | A) = 0.7

P(D | A) = 0.8

P(A) = 0.4

38

P(A) = 0.6

(a)

= P(D | A) × P(A) + P(D | A) × P(A) =

= 0.7 × 0.4 + 0.8 × 0.6 = 0.28 + 0.48 = 0.76

(b) Note that we now know that P (A ∩ D) = 0.28 and P(A ∩ D) = 0.48

P(A |D) = = =

P(D) P(D ∩ A) + P(D ∩ A)

P(A) − P(A ∩ D) 0.4 − 0.28 0.12

= = = 0.2

(P(A) − P(A ∩ D)) + P(A ∩ D) (0.4 − 0.28) + 0.48 0.6

Answer: Two in ten patients who were given physiotherapy should have been recommended to

rest instead.

(c) P(D) = 1 − P(D) = 1 − 0.6 = 0.4

P(D) × P(A) = 0.4 × 0.4 = 0.16

P(A ∩ D) = 0.28 6= 0.16 = P(D) × P(A)

Answer: The events of having condition A and being diagnosed with condition A are not

independent.

—————————————————

Week 4 Monday

39

Chapter 3

3.1 Introduction

We are interested in the probability of a variable X taking any given value x. The set of probabilities

for all possible values is called the probability distribution for X.

Example 3.1.1

Roll 2 dice. Let X be the sum of the two values. What is the probability distribution for X?

—————————

Table of possible events:

Die no. 2

1 2 3 4 5 6

1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)

2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)

Die no. 1 3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)

4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)

5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)

6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

Die no. 2

X 1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8

Die no. 1 3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

40

x 1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 5 4 3 2 1

P(X = x) 0 36 36 36 36 36 36 36 36 36 36 36

—————————————————

X

p(x) = 1

x

Expected Value of X

X

µ = µX = E(X) = xp(x)

| {z }

x

Alternative notations

Example 3.1.2

Roll 2 dice. Let X be the sum of the two values. We find the mean value for X:

1 2 3 4 5 6

µ =2× +3× +4× +5× +6× +7× +

36 36 36 36 36 36

5 4 3 2 1

+8 × + 9 × 10 × + 11 × + 12 × =

36 36 36 36 36

= =

36

14 × (1 + 2 + 3 + 4 + 5 + 3) 14 × 18

= = = 7

36 36

—————————————————

X frequency

x1 f1

x2 f2

.. ..

. .

xn fn

We calculate the mean value for X as follows:

41

n n Pn

X X fi xi fi

µ= xi p(xi ) = xi × Pn = Pi=1

n

i=1 i=1 i=1 fi i=1 fi

Example 3.1.3

Roll 2 dice. Let X be the sum of the two values. What is the expected value of X 2 ?

————————-

Define a new variable Y = X 2 . The probability distribution for Y is:

x 2 3 4 5 6 7 8 9 10 11 12

y 4 9 16 25 36 49 64 81 100 121 144

1 2 3 4 5 6 5 4 3 2 1

P(Y = y) 36 36 36 36 36 36 36 36 36 36 36

X 1 2 3

E(Y) = E(X 2 ) = y p(y) = 4 × +9× + 16 × +

y

36 36 36

4 5 6 5 4 3 2 1

+25 × + 36 × + 49 × + 64 × + 81 × + 100 × + 121 × + 144 × =

36 36 36 36 36 36 36 36

4 + 18 + . . . + 242 + 144

≈ 54.83

36

Note that:

E(X) = 7

[E(X)]2 = 49

E(X 2 ) ≈ 54.83

—————————————————

In general

[E(X)]2 6= E(X 2 ) .

If Y = f (X) then:

X P

E(Y) = E(f (X)) = y p(y) = x f (x) p(x)

y

Variance:

P

x (x

42

Standard Deviation:

p

σ= V ar(X)

Standard Deviation gives a quadratic mean distance of the value of X from its mean µ.

Example 3.1.4

Roll 2 dice. Let X be the sum of the two values. What is the variance and standard deviation

of X?

————————–

Recall that µX = 7.

x 2 3 4 5 6 7 8 9 10 11 12

(x − µ) −5 −4 −3 −2 −1 0 1 2 3 4 5

(x − µ)2 25 16 9 4 1 0 1 4 9 16 25

1 2 3 4 5 6 5 4 3 2 1

P(X = x) 36 36 36 36 36 36 36 36 36 36 36

X 1 2 3

Var(X) = E((X − µ)2 ) = (x − µ)2 p(x) = 25 × + 16 × +9× +

x

36 36 36

4 5 6 5 4 3 2 1

+4 × +1× +0× +1× +4× +9× + 16 × + 25 × =

36 36 36 36 36 36 36 36

25 + 32 + 27 + 16 + 5 + 0 + 5 + 16 + 27 + 32 + 25

≈ 5.83 .

36

p √

σ= V ar(X) ≈ 5.83 ≈ 2.42.

—————————————————

V ar(X) = E(X 2 ) − [E(X)]2

Example 3.1.5

Roll 2 dice. Let X be the sum of the two values. What is the variance of X?

——————–

Using previous results:

Note that this gives the same answer as the previous method.

43

—————————————————

*Non-examinable material*

Proof of Var(X)=E(X 2 ) − [E(X)]2 .

X

= (x − µ)2 p(x)

x

X

= (x2 − 2xµ + µ2 )p(x)

x

X X X

= x2 p(x) − 2xµp(x) + µ2 p(x)

x x x

X X X

2

= x p(x) − 2µ xp(x) + µ2 p(x)

x x x

= E(X 2 ) − 2µE(X) + µ2

= E(X 2 ) − 2E(X)E(X) + E(X)2

= E(X 2 ) − E(X)2

Week 4 Tuesday

Linear Function of X

Let X be a random variable and let a and b be constants. Consider Y = g(X) = a + bX. Then:

E(Y ) = a + bE(X)

V ar(Y ) = b2 V ar(X)

The linear function g is the only type of function that E(g(X)) = g(E(X)).

For standard deviation of Y we have:

p p √ p

σY = V ar(Y ) = b2 V ar(X) = b2 V ar(X) = |b| · σX

Summary:

V ar(a + bX) = b2 V ar(X)

σY = |b| · σX

*Non-examinable material*

Proof of E(a+bX)=a+bE(X).

44

X

E(a + bX) = (a + bx)p(x)

x

X X

= a p(x) + b xp(x)

x x

= a + bE(X)

= E((bX − bµX )2 )

= E((b(X − µX ))2 )

= E(b2 (X − µX )2 )

= b2 E((X − µX )2 )

= b2 V ar(X)

When all n possible values of a variable X have the same probability, we call it uniform distribution.

1

P(X = k) = for k ∈ {1, 2, 3, . . . , n}

n

Example 3.2.1

Roll a die and let X be the outcome of the roll. Then, X has uniform distribution.

1 1 1

P(1) = , P(2) = , . . . , P(6) =

6 6 6

—————————————————

Suppose that X takes values x ∈ {1, 2, 3, . . . , n} with uniform distribution. Then:

n+1

E(X) = 2

n2 −1

V ar(X) = 12

*Non-examinable material*

45

n

X

µ = kp(k)

k=1

n

X 1

= k

n

k=1

n

!

1 X

= k

n

k=1

1 1 n+1

= n(n + 1) =

n 2 2

σ 2 = E(X 2 ) − [E(X)]2

n

n+1 2

X

2

= k p(k) −

2

k=1

n

n+1 2

21

X

= k −

n 2

k=1

n

!

n+1 2

1 X 2

= k −

n 2

k=1

n+1 2

1 n(n + 1)(2n + 1)

= −

n 6 2

2

n −1

=

12

Example 3.3.1

If 20% of all cars are silver and 3 random cars form a queue at a set of traffic lights, what is the

probability that

(a) none of the cars are silver?

(b) exactly 1 of the cars are silver?

(c) exactly 2 of the cars are silver?

(d) all 3 of the cars are silver?

46

——————————————–

3

× 0.83 = 0.512

0 x x x 0

S x x

3

× 0.82 × 0.2 = 0.384

1 x S x 1

x x S

S S x

3

× 0.8 × 0.22 = 0.096

2 S x S 2

x S S

3

× 0.23 = 0.008

3 S S S 3

—————————————————

Suppose that there are 2 possible outcomes (e.g. success/failure) of a single event with probability p

(success) and (1 − p) (failure). Suppose also that the repeated events are independent. Then if there

are n events, the probability of exactly x of these events having ‘successful’ outcomes is given by the

binomial distribution.

n

P(X = x) = · px · (1 − p)n−x

x

Notation:

X ∼ Bi(n, p)

X ∼ Bi(3, 0.2)

Example 3.3.2

If 20% of all cars are silver and 5 random cars form a queue at a set of traffic lights, what is the

probability distribution of variable X denoting number of silver cars in the queue?

—————————————

5

P(X = 0) = · 0.20 · 0.85 = 0.85 ≈ 0.33

0

5

P(X = 1) = · 0.21 · 0.84 ≈ 0.41

1

5

P(X = 2) = · 0.22 · 0.83 ≈ 0.20

2

5

P(X = 3) = · 0.23 · 0.82 ≈ 0.05

3

5

P(X = 4) = · 0.24 · 0.81 ≈ 0.006

4

5

P(X = 5) = · 0.25 · 0.80 ≈ 0.0003

5

47

—————————————————

An alternative to calculate these probabilities using the formula is to use binomial distribution tables

(see Appendix A). Note tables only exist for some values of n and some values of p.

Example 3.3.3

Assume that the probability of a randomly selected person having a birthday in January is 1/12.

In a class of 20 students, what is the probability that

(a) exactly 3 of the students were born in January?

(b) exactly 2 of the students were born in January?

(c) no more than 3 of the students were born in January?

(d) at least 4 of the students were born in January?

—————————-

1 1

Let X denote number of students born in January. Then X ∼ Bi(20, 12 ). Note that p = 12 ≈

0.08. Using tables:

(a) P(X = 3) ≈ 0.1414

(b) P(X = 2) = 0.2711

(c) P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) ≈

≈ 0.1887 + 0.3282 + 0.2711 + 0.1414 = 0.9294

(d) P(X ≥ 4) = 1 − P(X ≤ 3) ≈ 1 − 0.9294 = 0.0706

—————————————————

Suppose that X ∼ Bi(n, p). Then:

E(X) = np

V ar(X) = np(1 − p)

48

This means that if we perform an event n number of times, and the probability of success in each

such event is p, then we would on average get np number of successes. While intuitive, note that this

statement requires a proof, which we omit here.

Suppose that there are only 2 possible outcomes of an event with probabilities p (success) and 1 − p

(failure), and the event is repeated. The probability of the first success occurring in X th event is given

by the geometric distribution Ge(p). Let X denote the number of event which is the first success.

Then:

P(X = x) = (1 − p)x−1 p

Example 3.4.1

Toss a coin. Probability of heads is 21 . Let X be the number of the toss which is the first heads.

What are the probabilities of X being 1, 2, 3?

————————

We observe:

1

X ∼ Ge( )

2

Therefore:

1 1 1

P(X = 1) = (1 − )0 ( ) =

| {z2 } |{z}

2 2

tails heads

1 1 1

P(X = 2) = (1 − )1 ( ) =

| {z } |{z} 4

2 2

tails heads

1 1 1

P(X = 3) = (1 − )2 ( ) =

| {z2 } |{z}

2 8

tails heads

—————————————————

Example 3.4.2

Consider a queue of cars. The probability that any given car is silver is 0.2. Let X be the

number of the first silver car in the queue. What are the probabilities: P(X = 1), P(X = 2),

P(X > 3), P(X ≥ 7)?

—————————-

P(X = 1) = 0.2

49

P(X ≥ 7) = 0.86

—————————————————

Suppose that X ∼ Ge(p). Then:

1

E(X) = p

1−p

V ar(X) = p2

Note that this formula for expected value means that if the probability of success is 31 , then it would

take us on average 3 trials to get a success, which is very intuitive. Again, we omit the proof.

Week 5 Monday

Events occur at points in time/space. The average rate of events occuring is λ events per unit of

time/space. Let X be number of events in an interval of fixed length T . Then:

µx ·e−µ

P(X = x) = x! .

σ2 = µ .

X ∼ Po(µ) .

We use the formula or the probability tables (which can be found in Appendix B) to find the proba-

bilities.

Example 3.5.1

If X ∼ Po(1.9), then:

P(X = 2) = 0.2700

P(X = 3) = 0.1710

P(X ≥ 3) = 1 − P(X ≤ 2) = 1 − (P(X = 0) + P(X = 1) + P(X = 2))

= 1 − (0.1496 + 0.2842 + 0.2700) = 0.2962

—————————————————

Example 3.5.2

50

Assume that the number of cars travelling along a minor road has a Poisson distribution with

mean 4 vehicles per hour. Determine the probability that

(a) exactly 4 vehicles pass in a given hour,

(b) at least 2 vehicles pass in a given 30 minute interval,

(c) no cars pass in at least one out of the next 4 hours.

————————————-

(a) Let X(T ) be the number of vehicles passing in a time interval of length T hours. Our

λ = 4, where λ gives us average number of cars passing every hour. We are interested in

the average corresponding to our chosen time interval, which coincidentally is an hour. Hence,

µ = λ × T = 4 × 1 = 4.

Then, X(1) ∼ Po(4).

Therefore, we are looking for the probability:

P(X = 4) = 0.1954.

(b) This time, out time interval T = 0.5, and hence µ = 4 × 0.5 = 2 and X(0.5) ∼ Po(2).

Therefore, we are looking for the probability:

(c) We treat each of the four hours as a separate event. Moreover, we treat these hours inde-

pendently of each other. Hence, we first observe that:

P(no cars in at least 1 out of 4 hours) = 1 − P(at least 1 car passes in each of the 4 hours)

.

We want to calculate the probability that for any given hour there is at least one car passing.

Hence we compute µ = 4 × 1 = 4 and denote X ∼ Po(4).

We calculate:

.

Thus we find:

P(at least 1 car passes in each of the 4 hours) = (P(At least one car passes in a given hour))4

= 0.98174 ≈ 0.9288.

51

P(no cars in at least 1 out of 4 hours) = 1 − P(at least 1 car passes in each of the 4 hours)

= 1 − 0.9288 = 0.0712.

—————————————————

The Poisson distribution gives a good approximation to the binomial distribution if n is large and p

is small. Then:

µ = np

Po(np) ∼ Bi(n, p)

Example 3.5.3

50

P(X = 0) = (1 − 0.05)50 ≈ 0.0769

0

50

P(X = 1) = · 0.05 · (1 − 0.05)49 ≈ 0.2025

1

Now we will try to approximate this binomial distribution by Poisson distribution as follows:

µ = np = 50 × 0.05 = 2.5. Consider X 0 ∼ Po(2.5). By using tables for Poisson distribution:

P(X 0 = 0) = 0.0821

P(X 0 = 1) = 0.2052.

We observe that the values are not precisely the same but they are close. In the language of

limits, if you fix the product np and take the limit of a given binomial probability for n tending

to infinity, you will obtain an expression for a probability given by the formula for the Poisson

distribution with mean µ = np. Note that this also explains why the variance for Poisson

distribution is given by µ – compare it with the variance for the binomial distribution.

—————————————————

52

Week 5 Tuesday

Consider when a type of object have two associated variables X and Y . We might be interested in:

a. functions of X and Y ,

A linear combination of two random variables X and Y is an expression of the form αX + βY , where

α and β are real constants, i.e. α, β ∈ R. A linear combination of X and Y satisfies:

P P

E (XY ) = x y x · y · P(X = x and Y = y).

A related quantity is the (Pearson Product-Moment) correlation coefficient:

Cov(X,Y )

ρX,Y = σX · σY ,

Interpretation:

1. Sign:

a) ρX,Y = 0 means that there is no linear relationship between X and Y ; we say that there is

no correlation; note that we cannot conclude if X and Y are independent or not;

b) ρX,Y > 0 means that generally large X corresponds to large Y , and small X corresponds to

small Y ; we say that the correlation is positive;

c) ρX,Y < 0 means that generally large X corresponds to small Y and vice versa; we say that

the correlation is negative.

2. Magnitude:

a) The Pearson correlation coefficient always takes values in the interval between -1 and 1:

−1 ≤ ρX,Y ≤ 1.

53

b) We can verbally describe the strength of the correlation:

i) 0.00-0.19 very weak,

ii) 0.20-0.39 weak,

iii) 0.40-0.59 moderate,

iv) 0.60-0.79 strong,

v) 0.80-1.0 very strong.

• ρX,Y = 0,

Example 3.6.1

The final grade Z (given as a percentage) for a module is calculated using the continuous as-

sessment mark X (given as a percentage) and the exam mark Y (given as a percentage) in the

following way:

Z = 0.4X + 0.6Y.

54

X

40 50 60 70 80 90

30 1 1 1 0 0 0

Y 40 0 3 2 0 1 0

50 0 1 5 10 2 2

60 0 2 6 4 1 2

70 0 0 0 1 4 1

We also want to find out whether there is correlation between X and Y , namely whether generally

high continuous assessment mark corresponds to high exam mark.

————————————-

First, we observe that the numbers in the table sum up to 50. Hence in order to obtain proba-

bilities, we need to divide each frequency by 50. This new table is called joint probability mass

function.

X Marginal Distribution

40 50 60 70 80 90 ↓ of Y

30 0.02 0.02 0.02 0 0 0 0.06

Y 40 0 0.06 0.04 0 0.02 0 0.12

50 0 0.02 0.1 0.2 0.04 0.04 0.4

60 0 0.04 0.12 0.08 0.02 0.04 0.3

70 0 0 0 0.02 0.08 0.02 0.12

Marginal Distribution of X → 0.02 0.14 0.28 0.3 0.16 0.1 1

An entry in x column and y row corresponds to the probability P(X = x and Y = y). The

marginal distributions of X and Y allow us to calculate the following:

X

E(X) = x · p(x) = 40 · 0.02 + 50 · 0.14 + 60 · 0.28 + 70 · 0.3 + 80 · 0.16 + 90 · 0.1 = 67.4

E(Y ) = . . . = 53

E(Z) = E(0.4X + 0.6Y ) = 0.4 · E(X) + 0.6 · E(Y ) = 58.76

We can also calculate conditional distributions, for example the distribution of X when Y = 50.

It would be given by row 3 of the joint probability mass table, rescaled to sum to 1. As the sum

of the values in this row is 0.4 = 25 , this corresponds to dividing all probabilities by 52 , which is

5

the same as multiplying them by 2 = 2.5

X 40 50 60 70 80 90

P(X = x | Y = 50) 0 0.05 0.25 0.5 0.1 0.1

Then we can compute:

55

X

E(X | Y = 50) = x · P(X = x | Y = 50)

= 40 · 0 + 50 · 0.05 + 60 · 0.25 + 70 · 0.05 + 80 · 0.1 + 90 · 0.1 = 69.4

Alternatively, noting that the numbers in the original frequency table need to be rescaled by

1 5 1

50 · 2 = 20 , we can calculate E(X | Y = 50) directly from it:

1

E(X | Y = 50) = · (40 · 0 + 50 · 1 + . . . + 90 · 2) = 69.4

20

In order to calculate the correlation coefficient, let us first compute covariance of X and Y .

Hence we start from E(XY ):

XX

E(XY ) = x · y · P(X = x and Y = y) = (40 · 30 · 0.02) + (50 · 30 · 0.02)+

x y

Cov(X, Y ) = E(XY ) − E(X) · E(Y ) = 3592 − (67.4) · (53) = 19.8

p p

σX = E(X 2 ) − [E(X)]2 = (402 · 0.02 + 502 × 0.14 + . . .) − (67.4)2 = 12.4

σY = . . . = 10.4

Cov(X, Y ) 19.8

ρX,Y = = = 0.15

σX · σY 12.4 · 10.4

As ρX,Y > 0 we conclude that there is a positive correlation between continuous assessment

grade and exam mark, namely the better continuous assessment mark, the better exam mark.

As |ρX,Y | = 0.15, the correlation can be described as very weak.

—————————————————

Week 6 Tuesday

56

Chapter 4

Fundamentals of Integration

Powers of x

Positive powers:

xn = |x · x · .{z

. . · x · x}

n times

Negative powers:

1

x−n =

xn

Example 4.0.1

1

= x−2

x2

5

= 5x−3

x3

1

= x−1

x

—————————————————

Fractional powers:

√ m

( n x)m = x n

Example 4.0.2

√ 1

x = x2

√

5

1

x = x5

57

1 1 1

√ = 1 = x− 2

x x2

2 √

3 √

x3 = x2 = ( 3 x)2

3 √

2 √

x2 = x3 = ( 2 x)3

1 1 1

= √

3

x− 3 = 1

x x 3

—————————————————

1. Consider a function which is a power of x. A general expression for it is xs . Note that s can be

negative, it can be also a fraction. The number s cannot be equal to −1 though (i.e. we are not

considering x1 ). We want to calculate area under the graph of the function xs .

2. a) For each term xs , add 1 to the power (to get the power s + 1) and divide the whole expression

by the new power (divide by s + 1). So the entire procedure results in:

xs+1

.

s+1

b) For a constant term c, multiply by x (i.e. if we want to calculate area under the graph of c,

we first obtain cx.

3. Substitute in values of x at each endpoint of the region of which area we want to compute, into

the results from (2) and subtract the resultant values.

Example 4.0.3

x=1 x=1

x2 12 02

Z

1

y = 2x Area = (2x)dx = 2 · =2· − =2· =1

x=0 2 x=0 2 2 2

—————————————————————————-

58

Z x=3

y=5 Area = (5)dx = [5x]x=3

x=1 = [5 · 3 − 5 · 1] = 15 · 5 = 10

x=1

—————————————————————————-

x=2 x=2

x4 x3

Z

3 2 3 2

y =x +x +2 Area = (x + x + 2)dx = + + 2x

x=1 4 3

4 x=1

23

4

2 1 13 8 1 1

= + +2·2 − + + 2 · 1 = [4 + + 4] − [ + + 2] = . . .

4 3 4 3 3 4 3

—————————————————————————-

59

" 3

#x=4

√

Z x=4

1 1 x2

y= x+1=x +1

2 Area = (x + 1)dx = 2 ·

2

3 +x =

x=1 2 x=1

x=4 x=4

2 √ 3

2 3

= ·x +x

2 = · ( x) + x =

3 x=1 3

x=1

2 √ 3 2 √ 3

= · ( 4) + 4 − · ( 1) + 1 =

3 3

16 2

= +4 − + 1 = ...

3 3

—————————————————————————-

(

1

2 x, 0 ≤ x ≤ 2,

y=

−x4 + 4x − 3, 2 < x,

Area = (y)dx = (y)dx + (y)dx =

x=0 x=0 x=2

Z x=2 Z x=3

1

= ( x)dx + (−x4 + 4x − 3)dx = . . .

x=0 2 x=2

60

—————————————————————————-

1

y= = x−2

x2

x=∞ x=∞

x=∞ x=∞

x−1

Z Z

−2 1

Area = (y)dx = (x )dx = = − =

x=1 x=1 −1 x=1 x x=1

1 1

= limx→∞ − − − =0+1=1

x 1

What is the area of the region between x=2 and x=1? What about between x=3 and x=2?

—————————————————————————–

Sometimes the function does not look like a sum of powers of x at the first glance. Here are

examples which cannot be integrated directly, but can be expanded in order to integrate as

usual.

Further examples:

1. y = (x − 3)2 = x2 − 6x + 9

1+2x3 2x3

2. y = 3x3

= 1

3x3

+ 3x3

= 13 x−3 + 2

3

—————————————————

61

Week 7 Monday

4.1 Introduction

A continuous random variable (CRV) is a random variable that can take only real value within a

specified range. The distribution of probabilities for a CRV can be expressed using a graph where

areas under the graph represent probabilities.

Z 3

P(2 < X < 3) = f (x)dx

2

Since the probabilities of all possible outcomes always sum up to 1, therefore the total area under the

graph is equal to 1. A function f (x) which gives probability for X in this way is called a probability

density function (p.d.f.).

Example 4.1.1

For a particular bus route, buses run every 10 minutes. Let T be the random variable for waiting

time for the next bus. T can take only value between 0 and 10. The waiting time can be modelled

using a uniform distribution between T = 0 and T = 10. The p.d.f. is of the form:

(

k 0 ≤ t ≤ 10

f (t) =

0 otherwise

10k = 1 =⇒ k = 0.1

62

(

0.1 0 ≤ t ≤ 10

f (t) =

0 otherwise

Then for example to find the probability of the waiting time between T = 2 and T = 5, calculate

the area:

—————————————————

Example 4.1.2

The time X measured in seconds between the consecutive vehicles passing a fixed point on a

motorway is modelled as a continuous random variable with p.d.f:

(

40−x

800 0 ≤ x ≤ 40

f (x) =

0 otherwise

The probability that a vehicle passes after 10 but before 25 seconds is:

25 25

x2

40 − x

Z

1

P(10 < X < 25) = dx = 40x −

10 800 800 2 10

1 625 100

= (40 × 25 − ) − (400 − ) = 0.42

800 2 2

—————————————————

63

For continuous random variables the probability of the random variable taking any particular value

a ∈ R is 0.

P(X = a) = 0

Also:

This does not mean that X = a is impossible. Instead it is just because the probabilities in the

distribution are spread out so thinly (over infinitely many values) that probabilities can only be seen

on intervals.

We can interpret f (x) as relating to the probability of getting a value close to X.

• The cumulative distribution function (c.d.f.) of a random variable X is the function F (x) defined

such that:

• The value of F (x) must always lie between 0 and 1 since F gives probabilities.

• Z x

F (x) = P(X ≤ x) = f (t)dt

−∞

F (x) → 1 as x→∞

F (x) → 0 as x → −∞

•

P(c ≤ X ≤ d) = P(X ≤ d) − P(X ≤ c) = F (d) − F (c)

64

Example 4.1.3

The p.d.f. is given by:

(

0.1 0 ≤ t ≤ 10

f (t) =

0 otherwise

To find c.d.f.:

Z x Z x

For x ≤ 0 : F (x) = f (t)dt = 0dt = [0 · t]x−∞ = [0]x−∞ = 0

−∞ −∞

Zx Z 0 Z x Z x

For 0 ≤ x ≤ 10 : F (x) = f (t)dt = 0dt + f (t)dt = F (0) + 0.1dt =

−∞ −∞ 0 0

Z x Z 0 Z 10 Z x

For x ≥ 10 : F (x) = f (t)dt = 0dt + f (t)dt + f (t)dt =

−∞ −∞ 0 10

Z x

= F (0) + (F (10) − F (0)) + 0dt =

10

= F (10) + [0]x10 = 0.1 · 10 + 0 = 1

Hence:

x Z 0

x≤0

F (x) = f (t)dt = 0.1x 0 ≤ x ≤ 10

−∞

1 x ≥ 10

—————————————————

65

Week 7 Tuesday

Example 4.1.4

(Vehicles on the motorway Example 4.1.2 continued.) Recall the p.d.f. for this situation:

(

40−x

800 0 ≤ x ≤ 40

p.d.f. : f (x) =

0 otherwise

0 x≤0

R x 40−t

c.d.f. : F (x) = F (0) + 0 800 dt 0 ≤ x ≤ 40

Rx

F (40) + 40 0dt x ≥ 40

x x

t2 x2

40 − t

Z

1 1

F (0) + dt = 0 + 40t − = 40x − −0

0 800 800 2 0 800 2

x

402 402

Z

1 2 1 1

F (40) + 0dt = 40 − +0= 1− =2· =1

40 800 2 800 2 2

h 0 i x≤0

1 x2

c.d.f. : F (x) =

800 40x − 2 0 ≤ x ≤ 40

1 x ≥ 40

—————————————————

Example 4.1.5

(

3

4 (3 − x)(x − 1) 1 ≤ x ≤ 3

p.d.f. : f (x) =

0 otherwise

x x x

3 −t3

Z Z

3 3 2 2

(3 − t)(t − 1)dt = (−t + 4t − 3)dt = + 2t − 3t =

1 4 41 4 3 1

3 3

−x3 3x2 9x

3 −x 3 −1

= + 2x2 − 3x − + 2 · 12 − 3 · 1 = + − +1

4 3 4 3 4 2 4

Thus:

66

0 x≤1

−x3 3x2 9x

c.d.f. : F (x) = + − +1 1≤x≤3

4 2 4

1 x≥3

9

P(X ≤ 2) = F (2) = −2 + 6 −

+ 1 = 0.5

2

—————————————————

Let X be a continuous random variable with p.d.f. f(x). Then the expected value of X is defined to

be:

R∞

E(X) = −∞ x · f (x)dx .

where:

R∞

E(X 2 ) = −∞ x

2 · f (x)dx .

1

RM 1

F (M ) = 2 −∞ f (x)dx = 2 .

Note that this is consistent with the original description of the median for the samples. How would

we define lower and upper quartile of the random variable X?

Example 4.1.6

A continuous random variable X has pdf given by

(

2x, 0 ≤ x ≤ 1

f (x) = .

0, otherwise

Find the cdf. Find the expected value E(X), variance Var(X), standard deviation σX , and the

median M of X.

———————————————-

Z x

2tdt = [t2 ]x0 = x2 − 0 = x2

0

0

x≤0

c.d.f. : 2

F (x) = x 0 ≤ x ≤ 1

1 x≥1

67

∞ 1 1 1

2x3

Z Z Z

2 2

E(X) = x · f (x)dx = x · (2x)dx = 2x dx = =

−∞ 0 0 3 0 3

∞ 1 1 1

2x4

Z Z Z

2 2 2 3 1

E(X ) = x · f (x)dx = x · (2x)dx = 2x dx = =

−∞ 0 0 4 0 2

2

2 1 2 2 1

Var(X) = E(X ) − [E(X)] = − =

2 3 18

p 1

σX = Var(X) = √

3 2

1 1 √

=⇒ x2 =

F (M ) = =⇒ x = 0.5 ≈ 0.71

2 2

—————————————————

Consider n random variables X1 , X2 , . . . , Xn . Then:

If random variables X1 , X2 , . . . , Xn are independent we also have:

Example 4.1.7

In a large company the wage per hour of employees is modelled as a continuous random variable

X with probability distribution given by

2500 , x ≥ 5,

f (x) = x5

0,

x < 5,

(a) Find the expected hourly wage.

68

(b) The annual wage is given by the random variable Y . Assuming all employees are paid for

1920 hours work a year, find the expected annual wage.

(c) The same company gives an annual bonus that is modelled as a continuous random variable

Z with probability distribution given by

1

, 1000 ≤ z ≤ 2000

g(z) = 1000

0,

otherwise

Find the expected annual bonus and the expected total money in a year for an employee at the

company.

——————————————–

(a)

∞

∞ ∞

x−3 −2500 1 ∞

−2500 1

Z Z

2500 −4 20

E(X) = x· 5 dx = 2500x dx = 2500 = 3

= 0− 3

=

5 x 5 −3 5 3 x 5 3 5 3

(b)

20

Y = 1920X =⇒ E(Y ) = E(1920X) = 1920E(X) = 1920 · = 12800

3

(c)

2000 2 2000

20002 10002

Z

1 1 z 1 4000 1000

E(Z) = z· dz = = − = − = 1500

1000 1000 1000 2 1000 1000 2 2 2 2

—————————————————

Week 8 Monday

The normal distribution gives a symmetrical bell shaped curve that can be fitted to data.

69

The normal distribution then gives a statistical model (pdf). The normal distribution is not relevant

for a number of sets of data.

The normal distribution only depends on the mean µ and standard deviation σ for the data. The

mean µ gives the position of the maximum of the curve and the standard deviation σ gives a measure

of how spread out the data is.

The formula for the normal curve is too complicated to integrate directly. The tables give values for

the cdf of Standard Normal Distribution. Standard Normal Distribution is a normal distribution for

which µ = 0 and σ = 1. Recall that cdf means that the tables give areas, and so the corresponding

probabilities, up to given z.

Example 4.2.1

example with z = −0.73:

70

Find −0.7 in the LHS column of the table. Find 3 in the top row of the table. Area is entry

along from −0.7 and below 3, i.e.:

—————————————————

Example 4.2.2

A random variable Z has a normal distribution with mean 0 and standard deviation 1. What is

the probability that

1

(a) Z takes a value less than ,

2

(b) Z takes a value greater than 2.63.

(c) Z takes a value with |Z| > 1.

————————————————————————————————

(a)

(b)

(c)

P(|Z| > 1) = P(Z > 1) + P(Z < −1) = 2 · 0.1587 = 0.3174

—————————————————

Let X have a normal distribution with mean µX and standard deviation σX . We write it down as:

71

2 ) .

X ∼ N(µX , σX

Then:

X−µX

Z= σX

has a standard normal distribution with mean µZ = 0 and standard deviation σZ = 1. Note that by

convention we use the letter Z to denote a random variable which has standard normal distribution.

Example 4.2.3

The time to complete a project is, from previous experience, estimated to be normally distributed

with a mean of 45 weeks and a standard deviation of 5 weeks. Find the probability that the

project will:

(a) be completed within 43 weeks,

(b) be completed within 49 weeks,

(c) take longer than 52 weeks,

(d) take between 41 and 47 weeks.

————————————————————————————————

Let X ∼ N(µ = 45, σ 2 = 52 ) model the number of weeks necessary to complete a project.

(a)

X −µ 43 − 45

P(X < 43) = P( < ) = P(Z < −0.4) = 0.3446

σ 5

(b)

X −µ 49 − 45

P(X < 49) = P( < ) = P(Z < 0.8) = 0.7881

σ 5

(c)

52 − 45 X −µ

P(52 < X) = P( < ) = P(1.4 < Z) = 1 − P(Z < 1.4) = 1 − 0.9192 = 0.0808

5 σ

52 − 45 X −µ

P(52 < X) = P( < ) = P(1.4 < Z) = P(Z < −1.4) = 0.0808

5 σ

72

(d)

41 − 45 X −µ 47 − 45

P(41 < X < 47) = P( < < ) = P(−0.8 < Z < 0.4) =

5 σ 5

= P(Z < 0.4) − P(Z < −0.8) = 0.6554 − 0.2119 = 0.4435

—————————————————

Example 4.2.4

The thickness of manufactured metal plates (intended to be 20mm) is normally distributed with

µ = 20mm and σ = 0.04mm.

(a) What proportion of plates can be expected to be at most 20.10mm thick?

(b) What proportion of plates can be expected to be more than 19.95mm thick?

(c) What proportion of plates can be expected to be within 0.05mm of the target thickness of

20mm?

(d) What value would one have to set as tolerance limits 20 ± c so that the percentage of plates

outside the tolerance limit is 5%?

—————————————————————————————————————————-

Let X ∼ N(20, 0.042 ) model the thickness of the manufactured plates.

(a)

20.10 − 20

P(X < 20.10) = P(Z < ) = P(Z < 2.5) = 0.99379

0.04

(b)

19.95 − 20

P(19.95 < X) = P( < Z) = P(−1.25 < Z) = P(Z < 1.25) = 0.8944

0.04

(c)

19.95 − 20 20.05 − 20

P(19.95 < X < 20.05) = P( <Z< ) = P(−1.25 < Z < 1.25) =

0.04 0.04

= P(Z < 1.25) − P(Z < −1.25) = 0.8944 − 0.1056 = 0.7888

(d) TBC.

—————————————————

Week 8 Tuesday

Either use percentage points of normal distribution table, or use the main table in reverse.

Example 4.2.5

73

Let Z be a random variable with standard normal distribution. Find z ∗ such that:

(a)

P(Z < z ∗ ) = 0.7

(b)

P(Z < z ∗ ) = 0.15

(c)

P(Z < z ∗ ) = 0.56

————————————————————————————————

(a) Percentage point table, for LHS column 0.7 the RHS column gives value of z ∗ . Thus:

z ∗ = 0.5244

P(Z < z1 ) = 0.85.

Hence:

z1 = 1.0364

z ∗ = −z1 = −1.0364

(c)

We observe that 0.56 is not in LHS of the percentage point table. Therefore we use the main

tables. The closest number to 0.56 is 0.5596. Therefore:

z ∗ = 0.15

(Later on we will see that we could be even more exact, namely z ∗ = 0.151.)

74

—————————————————

Example 4.2.6

(This is continuation of Example 4.2.4.) The thickness of manufactured metal plates (intended

to be 20mm) is normally distributed with µ = 20mm and σ = 0.04mm.

(d) What value would one have to set as tolerance limits 20 ± c so that the percentage of plates

outside the tolerance limit is 5%?

—————————————————————————————————-

(d) We need to find c such that P(20 − c < X < 20 + c) = 0.95.

−c c

P(20 − c < X < 20 + c) = P( <Z< ) = P(−25c < Z < 25c) = P(Z < 25c) − P(Z < −25c) =

0.04 0.04

= P(Z < 25c) − (1 − P(Z < 25c)) = 2 · P(Z < 25c) − 1

Hence:

2 · P(Z < 25c) − 1 = 0.95 =⇒ P(Z < 25c) = 0.975.

—————————————————

75

Values of z with 3 Decimal Places

For the third decimal place, use the column add proportional parts.

Example 4.2.7

(a)

P(Z < 1.123) = 0.8686 + 0.0006 = 0.8692

(b)

P(Z < 0.659) = 0.7422 + 0.0029 = 0.7451

(c)

P(Z < −0.326) = 0.3745 − 0.0022 = 0.3723

—————————————————

2 , then we write:

If X is normally distributed with mean µX and variance σX

2 ) .

X ∼ N(µX , σX

If X ∼ N(µX , σX Y Y

2 + σ2 ) ,

X + Y ∼ N(µX + µY , σX Y

and

2 + σ2 ) .

X − Y ∼ N(µX − µY , σX Y

Notice that we always add variances, even if we need to subtract the means. Notice that this theorem

not only tells us what the mean and variance of X + Y and X − Y is, but also that they are normally

distributed.

Example 4.2.8

A machined rod has to pass through a drilled hole in a component. The machining process

produces rods with a diameter that is normally distributed with a mean of 6mm and a standard

deviation of 0.1mm. The drilled process gives holes with mean diameter 6.1mm and SD 0.05mm.

For what proportion of rod-component pairs will the rod be too big to pass through the hole?

—————————————————————————————————————————-

Let rod’s diameter be modelled by:

X ∼ N(6, 0.12 )

Y ∼ N(6.1, 0.052 )

76

We need to find the probability that for a randomly selected rod and a randomly selected hole,

the hole’s diameter is smaller than the rod’s diameter, namely:

Therefore:

0 − (−0.1)

P(0 < X−Y ) = P <Z = P(0.893 < Z) = P(Z < −0.893) = 0.1867−0.0008 = 0.1859.

0.112

Hence, 0.1859 is the proportion of the rod-component pairs for which the rod is too big to pass.

—————————————————

Example 4.2.9

Consider 1 litre and 2 litre bottles of milk. Let X be the random variable for the volume of a 1l

bottle (in ml), with the volume of each bottle being independent and X ∼ N (1006, 32 ). Similarly

for 2l bottles, let the random variable Y denote the volume of milk in ml, begin independent

with Y ∼ N (2008, 42 ).

(a) Assume you buy two 1 litre bottles. What is the probability of getting more than 2 litres in

total?

(b) Now assume you buy one 2l and two 1l bottles. What is the probability there is more milk

in the 2l bottle than in the 1l bottles?

(c) Now assume you buy one 2l and one 1l bottle. What is the probability that there is more

milk in the 2l bottle than twice the contents of the 1l bottle?

—————————————————————————————————————————-

(a) Let X1 ∼ N (1006, 32 ) denote the first picked bottle and X2 ∼ N (1006, 32 ) denote the second

picked bottle. Now we want to know:

P(2000 < X1 + X2 )

2 + σ 2 = 32 + 32 = 18,

The random variable X1 + X2 has mean µX + µX = 2012 and variance σX X

√

and hence the standard deviation 18.

√ 2

X1 + X2 ∼ N(2012, 18 )

Therefore:

2000 − 2012

P(2000 < X1 + X2 ) = P √ <Z = P(−2.828 < Z) = P(Z < 2.828) =

18

= 0.99760 + 0.00006 = 0.99766

77

√ 2

(b) Let Y ∼ N (2008, 42 ) represent a randomly picked 2l bottle, and let X1 +X2 ∼ N(2012, 18 )

be the sum of two randomly picked 1l bottles, as in (a). Then we are interested in:

The random variable Y −X1 +X2 is normally distributed with mean µY −2·µX = 2008−2012 =

−4 and variance σY2 + 2 · σX

2 = 42 + 18 = 34. Hence:

√ 2

Y − X1 + X2 ∼ N(−4, 34 )

Thus:

0 − (−4)

P(0 < Y −X1 +X2 ) = P( √ < Z) = P(0.686 < Z) = P(Z < −0.686) = 0.2483−0.0019 = 0.2464

34

(c) This question is different from (b) as instead of two independently selected 1l bottles we now

need to consider double the weight of one bottle. Let X ∼ N (1006, 32 ) represent the randomly

picked 1l bottle and Y ∼ N (2008, 42 ) represent a randomly picked 2l bottle. We are looking to

find:

P(2X < Y ) = P(2X − Y < 0).

First consider the random variable 2X. It is a multiple of a random variable, and hence E(2X) =

2E(X) and Var(2X) = 22 Var(X). A multiple of a normally distributed variable is also a normally

distributed variable. Hence:

2X ∼ N(2 · µX , 22 · σX

2

) = N(2 · 1006, 22 · 32 ) = N(2012, 62 ).

Notice that it is different than the distribution of X1 + X2 . In particular, this one has a bigger

variance. Now:

√

2X − Y ∼ N(2012 − 2008, 42 + 62 ) = N(4, 52).

Hence:

0−4

P(2X − Y < 0) = P(Z < √ ) = P(Z < −0.555) = 0.2912 − 0.0017 = 0.2895.

52

—————————————————

Week 9 Monday

78

Chapter 5

Sampling

Sample is a smaller subset o the parent population: mean x, variance s2 (ROMAN LETTERS).

x and s2 are often used to estimate µ and σ 2 .

A sample can be used to test whether or not a suspected fact is true.

Example 5.1.1

Let X be a random variable representing a result of throwing a die. We want to find out whether

the die is biased – in particular if P(X = 6) > 61 .

The NULL HYPOTHESIS is the expected outcome, denoted by H0 . (Here H0 : P(X = 6) = 16 .)

The ALTERNATIVE HYPOTHESIS is H1 . (Here H1 : P(X = 6) > 16 .)

—————————————————

A sample is unlikely to match the null hypothesis exactly. However, given a null hypothesis, if the

probability of the sample taking a certain range of values is below a certain small level then we deduce

that H0 is false.

This small level of probability is called a SIGNIFICANCE LEVEL, usually given as a percentage.

Significance level denotes the probability of rejecting the null hypothesis, given that it is true.

Method:

Example 5.1.2

Biased die with P(X = 6) > 16 .

79

1. Choose 1% significance level (probability 0.01).

2. Roll die 20 times. Results:

5 1 6 5 2 1 2 1 4 4 6 6 6 1 6 4 6 4 2 6

3. H0 : P(X = 6) = 16 .

H1 : P(X = 6) > 16 .

4. For the sample we get 7 sixes out o 20 rolls. Given H0 , let Y ∼ Bi(20, 16 ) model the

predicted behaviour (number of sixes out of 20 throws) and consider the probability given

by the binomial distribution P(Y ≥ 7) = 0.0375.

Note that hypothesis testing considers results (for the sample) that are at least as extreme

as the observed results.

5. 0.0375 > 0.01. Therefore accept H0 and conclude that the die is not biased.

—————————————————

Example 5.1.3

An average of 50% of people1 taking the UK driving test pass it. A complaint is made that a

particular examiner, Mr Smith, is too lenient on candidates. Unknown to Mr Smith his work

is observed and it is found that he passes 16 out of 20 tested candidates. Are the complaints

justified at the 5% significance level?

———————————————————-

Let p=probability that Mr Smith passes a candidate.

H0 : p=0.5

H1 : p > 0.5

Consider binomial distribution Bi(n = 20, p = 0.5).

P(At least 16 out of 20 passed) = 0.0059

0.0059 < 0.05

Therefore reject H0 . We have a reason to consider that Mr Smith is too lenient.

————————————————————–

In this example 16 out o 20 led us to reject H0 . What is the least number o passes out of 20

that would lead us to reject H0 ?

————————————————————–

Use X ∼ Bi(n = 20, p = 0.5).

P(X ≥ 14) = P(X ≥ 15) + P(X = 14) = 0.0577 > 0.05

1

actually in 2012-13 it was 47.4%

80

—————————————————

The previous two examples were 1–tail tests where we were only interested in one direction of bias.

(Too many passes, not too few passes). The following is an example of a 2–tail test.

Example 5.1.4

Testing if a coin is biased.

Let p be probability of heads.

H0 : p=0.5

H1 : p 6= 0.5

Experiment: toss coin 20 times. Obtain 4 heads and 16 tails.

We want to find the probability that an event at least this extreme happens under H0 . Using

X ∼ Bi(n = 20, p = 0.5):

0.0118 > 0.01 therefore at 1% significance level we conclude that the coin is not biased.

0.0118 < 0.05 therefore at 5% significance level we conclude that the coin is biased.

—————————————————

Week 9 Tuesday

Example 5.1.5

On a long-forgotten island, now rediscovered, 15 birds were found. The explorer wonders whether

they might be the same bird as described in legend where 1/4 are female and 3/4 male. How

many females are required in the 15 birds if you are to conclude they are not the same species

at the 10% level?

————————————

Let p be probability of female.

H0 : p = 0.25. This hypothesis states that it is the same species.

H1 : p 6= 0.25. This hypothesis states that it is not the same species.

We observe that in such a situation a 2-tail test is appropriate, as our alternative hypothesis

does not state whether higher or lower ratio is preferred. Therefore, we will be looking for so

called critical values, namely values a and b such that:

0.10

P(X ≤ a) ≤ = 0.05

2

0.10

P(X ≥ b) ≤ = 0.05

2

81

If in our sample we get values less or equal to a or at least b, it would lead us to the rejection of

the null hypothesis. Note that the significance level 10% splits for the 2–tail test, so that each

tail gets 5%.

Consider X ∼ Bi(15, 0.25) modelling the number of females in a sample of 15, under the as-

sumption of H0 .

P(X ≤ 2) = P(X ≤ 1) + P(X = 2) = 0.0802 > 0.05

P(X ≥ 6) = 0.1484 > 0.05

P(X ≥ 7) = 0.0173 < 0.05

Hence, our lower critical value a = 1 and our upper critical value b = 7.

Therefore, if X = 0 or X = 1 or X ≥ 7, we reject H0 and conclude that it cannot be the same

species.

—————————————————

Compare the application of the method for symmetrical and asymmetrical cases of binomial distribu-

tion.

Theorem 5.1.1 (Central Limit Theorem (CLT)) If we have a parent population with a normal

distribution with mean µ and variance σ 2 and we take a sample of size n from this population the the

2

distribution of the sample mean X is approximately N(µ, σn ) if n is sufficiently large (usually ≥ 30).

This number n can be smaller if the parent population already follows a normal distribution.

Example 5.1.6

Suppose that bags of sugar are packed by a machine to have a mean weight fixed by the operator

but a fixed standard deviation of 9.5 g. The distribution of weights is known to be Normal. The

mean weight of bags for today’s batch should be 500 g but the operator thinks that its level has

been fixed as too low. She takes a random sample of 16 bags and finds the total weight of the

bags is 7874 g. Carry out a suitable hypothesis test at the 1% significance level.

————————————

µ : mean weight of a bag produced by the machine

H0 : µ = 500

H1 : µ < 500

Here 1–tail test is appropriate as the operator is concerned only in case when the level is fixed

for too low.

82

By the central limit theorem, under the assumption of H0 , the mean of the sample follows

2 2

a normal distribution: X ∼ N(µ, σ16 ) = N(500, 9.5

n ). Our observed sample mean is given by

7874

x= 16 = 492.125. Hence we want to find:

!

492.125 − 500

P(X ≤ 492.125) = P Z ≤ 9.5 = P(Z ≤ −3.31) = 0.000466 < 0.01

4

Hence we reject H0 at 1% significance level and conclude that the machine has been set too low.

—————————————————

Example 5.1.7

A random variable X is Normally distributed with known standard deviation 15.0. The mean

is thought to be µ = 100. A sample of n = 25 data points are taken and the calculated mean is

104.7. Test the hypothesis that the mean is different from 100, at the 5% significance level.

———————————–

H0 : µ = 100

H1 : µ 6= 100

2 2

Sample mean follow normal distribution by CLT: X ∼ N(µ, σn ) = N(100, 15

25 ).

In this case the 2–tail test is appropriate. Let us calculate the probability of a sample mean at

least as extreme as 104.7.

2 · P(X ≥ 104.7) = 2 · P(Z ≥ 1.57) = P(Z ≤ −1.57) + P(Z ≥ 1.57) = 0.1164 > 0.05

Therefore, we accept the null hypothesis that the mean µ is not different from 100.

—————————————————

If we do not know the population standard deviation σ, we use the sample to estimate it. We need a

large n (size of the sample) to do this, typically n ≥ 50.

Example 5.1.8

thinks this is an overestimate and decides to test a random sample of 74 batteries, recording

83

the lifetime x minutes of each battery. The lifetimes are known to be Normally distributed. He

decides to carry out a hypothesis test at a 5% significance level.

The results of the test are:

X X

n = 74 x = 5877.4 x2 = 489, 442.8.

——————————————————-

H0 : µ = 80

H1 : µ < 80

Because the alternative hypothesis states that the mean is strictly smaller than it should be, a

1–tail test is appropriate.

P

x 5877.4

x= = = 79.42

n 74

rP r

x2 − n · (x)2 489, 442.8 − 74 · (79.42)2

s= = = 17.61

n−1 73

We will use the sample standard deviation s to estimate the population standard deviation σ.

σ2 17.612

X ∼ N(µ, ) = N(80, )

n 74

Therefore:

!

79.42 − 80

P(X ≤ 79.42) = P Z ≤ 17.61 = P(Z ≤ −0.283) = 0.39 > 0.05.

√

74

Hence we accept the null hypothesis and conclude that the battery life equals to 80 minutes.

—————————————————

Week 10 Tuesday

Given a sample of size n with mean x, a p% confidence interval is the interval for which there is a p%

chance that µ for the parent population lies in that interval.

kσ kσ

x− √

n

, x+ √

n

where:

84

• x is the sample mean,

Example 5.2.1

Confidence level 90%:

0.95% → k=1.6449

—————————————————

Confidence level k

90% 1.6449

95% 1.96

99% 2.58

Example 5.2.2

In a large town the distribution of incomes per family has a known standard deviation of

£17000. A random sample of 400 families was taken and the same mean income found to

be £21500. Calculate a 95% confidence interval for the mean income per family in the city.

———————————————————————-

σ = 17000

n=400

x = 21500

95% confidence level corresponds to k=1.96.

kσ kσ (1.96)(17000) (1.96)(17000)

x− √ , x+ √ = 21500 − √ , 21500 + √ = (19834, 23166)

n n 400 400

85

—————————————————

Example 5.2.3

A test has been designed to produce Normally distributed scores. The scores have a scale

from 0 to 100 and a known standard deviation of 25. A random sample of 10 people took the

test and had average score x̄ = 47.7. Find a 99% confidence interval and a 90% confidence

interval for the mean score of all people taking this test.

———————————————————————-

σ = 25

n=10

x = 47.7

99% confidence level corresponds to k1 = 2.58.

90% confidence level corresponds to k2 = 1.6449.

k1 σ k1 σ (2.58)(25) (2.58)(25)

x− √ , x+ √ = 47.7 − √ , 47.7 + √ = (27.30, 68.10)

n n 10 10

k2 σ k2 σ (1.6449)(25) (1.6449)(25)

x− √ , x+ √ = 47.7 − √ , 47.7 + √ = (34.70, 60.70)

n n 10 10

—————————————————

Example 5.2.4

earned an average of $546 per week in September 1996. Assume that this mean is based

on a random sample of 1000 workers and the standard deviation for that sample was $75.

Find a 99% confidence level for the mean.

———————————————————————-

n=1000

x = 546

s=75

99% confidence level corresponds to k = 2.58.

Assume that σ = s:

86

99% confidence interval is given by:

kσ kσ (2.58)(75) (2.58)(75)

x− √ , x+ √ = 546 − √ , 546 + √ = (539.9, 552.1)

n n 1000 1000

—————————————————

γ: γ% confidence level

Row ν = n − 1.

Example 5.2.5

A test is designed to produce scores that are Normally distributed and have a value between

0 and 100. A group of 15 students take the test and attain scores given by

64 89 44 76 63 81 58 69 55 93 53 33 68 60 63.

Use the data to construct a 95% confidence interval for the mean score of all people taking

this test.

———————————————————————-

n=15

x = 64+89+...+60+63 = 64.6

q P 15

x2i −nx2

s= n−1 = 15.93

ν = n − 1 = 14

γ = 95% confidence level with ν = 14 corresponds to k = 2.1448.

95% confidence interval is given by:

kσ kσ (2.1448)(15.93) (2.1448)(15.93)

x− √ , x+ √ = 64.6 − √ , 64.6 + √ = (55.78, 73.42)

n n 15 15

—————————————————

Example 5.2.6

To find out the cardiac demands of heavy snow shovelling 10 health men of the same age

participated in snow-removal tests. (They shovelled snow for 10 minutes at a time, with

10–15 minute rest periods. Their heart rate was measured at 2 minute intervals.) Their

mean heart rate in beats per minute was x̄ = 175 and the standard deviation was s = 15.

87

Find a 90 % confidence interval for the population mean (the mean heart rate for health

men of this age when shovelling snow).

Source: Franklin, B.A. et al. (1995), Cardiac demands of heavy snow shovelling, J. Ameri-

can Medical Association 273 [11].

———————————————————————-

n=10

x = 175

s = 15

ν =n−1=9

γ = 90% confidence level with ν = 9 corresponds to k = 1.8331.

90% confidence interval is given by:

ks ks (1.8331)(15) (1.8331)(15)

x− √ , x+ √ = 175 − √ , 175 + √ = (166.31, 183.69)

n n 10 10

—————————————————

Week 11 Monday

Linear regression involves finding the best mathematical model for a relationship between 2 variables

when the relationship is assumed to be linear, and when all that is known is a selection of data points

(xi , yi ) where i ∈ {1, 2, . . . , n} for the two variables x and y.

88

This type of model is generally used in situation where there is an element of ‘randomness’ in the

data. Therefore for a given value of x,

y = βˆ0 + βˆ1 x

will not give a certain value of y corresponding to x. Instead, it gives an expected value of y.

For a given set of data the line which we will choose is the one that minimises the expression:

n

X

(yi − (βˆ0 + βˆ1 xi ))2 .

i=1

Formulas:

xi )2

P

X (

Sxx = x2i −

nP P

X ( x i ) · ( yi )

Sxy = x i yi −

n

S xy

βˆ1 =

Sxx

βˆ0 = y − βˆ1 x

Example 5.3.1

Suppose an experiment involving five subjects is conducted to determine the relationship between

the percentage of a certain drug in the bloodstream and the length of time it takes to react to

a stimulus. The results are shown in the table below.

x (%) y (seconds)

1 1 1

2 2 1

3 3 2

4 4 2

5 5 4

( The number of measurements and the measurements themselves are unrealistically simple in

order to avoid excessive arithmetic and to concentrate instead of the processes in this introductory

example.)

Determine the straight line

y = β̂0 + β̂1 x.

————————————————————————

89

X

xi = 1 + 2 + 3 + 4 + 5 = 15

X

yi = 1 + 1 + 2 + 2 + 4 = 10

P

xi 15

x= = =3

Pn 5

yi 10

y= = =2

X n 5

x2i = 12 + 22 + 32 + 42 + 52 = 55

X

xi yi = 12 + 2 · 1 + 3 · 2 + 4 · 2 + 5 · 4 = 37

( xi )2 152

X P

2

Sxx = xi − = 55 − = 10

nP P 5

X ( xi ) · ( yi ) 15 · 10

Sxy = x i yi − = 37 − =7

n 5

Sxy 7

βˆ1 = = = 0.7

Sxx 10

βˆ0 = y − βˆ1 x = 2 − 3 · 0.7 = −0.1

y = −0.1 + 0.7x.

—————————————————

Example 5.3.2

.

Due primarily to the price controls of the car- YEAR PETROL CRUDE OIL

tel of crude oil suppliers (OPEC), the price y (cents/gallon) x (USD/barrel)

of crude oil rose dramatically from the mid 1973 39 3.89

1970s to the early 1980s. As a result, mo- 1975 57 7.67

torists were confronted with a similar upward 1976 59 8.19

spiral of petrol prices. The data in the ta- 1977 62 8.57

ble are typical prices for a gallon of regular 1978 63 9.00

petrol and a barrel of crude oil for the indi- 1979 86 12.64

cated years. 1980 119 21.59

1981 133 31.77

1982 122 28.52

1983 116 26.19

1984 113 25.88

1985 112 24.09

1986 86 12.51

1987 90 15.41

Given that

X X X X X

xi = 235.92, yi = 1257, x2i = 5074.0898, yi2 = 124459, xi yi = 24654.87,

90

(a) Use the data to calculate the least squares line that describes the relationship between the

price of a gallon of petrol and the price of a barrel of crude oil.

(b) If the price of crude oil fell to $8 a barrel, to what level would you expect the price of petrol

fall?

——————————————————-

(a)

P

xi 235.92

x= = = 16.85

Pn 14

yi 1257

y = = = 89.79

n 14

( xi )2 235.922

X P

Sxx = x2i − = 5074.0898 − = 1098.5

nP P 14

X ( x i ) · ( yi ) 235.92 · 1257

Sxy = xi yi − = 24654.87 − = 3472.6

n 14

Sxy 3472.6

βˆ1 = = = 3.161

Sxx 1098.5

βˆ0 = y − βˆ1 x = 89.79 − 16.85 · 3.161 = 36.52

y = 36.52 + 3.161x.

(b)

Hence, we would expect the petrol to cost 61.81 cents per gallon.

—————————————————

The straight line model only provides the information about E(Y | X). We now estimate the variance

in the y-values.

Note that SSE stands for ‘sum of squares error’.

yi )2

P

X (

Syy = yi −

n

ˆ

SSE = Syy − β1 · Sxy

SSE

s2 =

n−2

The quantity s2 gives an estimate of σ 2 , where σ 2 is the variance of the difference in actual y-values

from predicted y-values.

91

Measure of the Usefulness of a Model

The coefficient of determination r2 is given by:

SSE

r2 = 1 − .

Syy

It given the proportion of the total sample variability that is explained by the linear model.

Example 5.3.3

(Example 5.3.1 continued.)

X

yi2 = 1 + 1 + 4 + 4 + 16 = 26

( yi ) 2 102

X P

2

Syy = yi − = 26 − =6

n 5

SSE = Syy − βˆ1 · Sxy = 6 − 0.7 · 7 = 1.1

SSE 1.1

s2 = = = 0.36

n−2 3

√

s = 0.36 = 0.6

Thus, s = 0.6 is the typical error that we can expect in the predicted values of the model.

SSE 1.1

r2 = 1 − =1− = 0.82

Syy 6

This means that 82% of the variation in the y-values of the data is accounted for by the model.

—————————————————

Example 5.3.4

(Example 5.3.2 continued.)

( yi )2 12572

X P

Syy = − yi2 = 124459 − = 11598

n 14

SSE = Syy − βˆ1 · Sxy = 11598 − (3.161)(3472.6) = 621.1

SSE 621.1

s2 = = = 51.76

n−2 14 − 2

√

s = 51.76 = 7.2

Thus, s = 7.2 is the typical error that we can expect in the predicted values of the model.

SSE 621.1

r2 = 1 − =1− = 0.946

Syy 11598

This means that 94.6% of the variation in the y-values of the data is accounted for by the model.

—————————————————

92

Appendices

93

Appendix A

94

Appendix B

99

Appendix C

102

- Normal Probability DistributionUploaded byPratikshya Sahoo
- FHMM 1134 General Mathematics III Tutorial 3 2013 NewUploaded bykhohzian
- Quantifying Risk PMIUploaded bykappanjk7584
- OQ Risk Science 1.0Uploaded byjatkinson2
- 4Sampling and Sampling DistributionsUploaded bySouvik Ghosh
- Descriptive SPSSUploaded byaashendra
- UncertaintyUploaded byeamecl
- 21_2.pdfUploaded byAdjei Paul
- UT Dallas Syllabus for econ6311.001.09f taught by Daniel Griffith (dag054000)Uploaded byUT Dallas Provost's Technology Group
- CpkUploaded byIvan Savić
- CHAPTER 2 PART 1 Sampling DistributionUploaded byNasuha Mutalib
- prob statsUploaded byKausam Bhat
- may 2018.pdfUploaded byZain Mirxa Chughtai
- SyllabusUploaded byLeslie Sabiano Andres
- ch05ppt__2018_11_03_14_22_45.pptUploaded byBijal Danicha
- 37_1_dscrt_prob_distnUploaded bytarek moahmoud khalifa
- tema 2 ENGUploaded byXavier Jiménez
- Continuous DistributionsUploaded bysupiobia
- 03 The Nature of Data.pptUploaded bychteo1976
- 1questions for pain medication data setUploaded byapi-125958972
- 3. Introduction Biostat.pptUploaded bylailykurnia
- Statistics 1Uploaded byafoo1234
- c3_distUploaded byRiajimin
- Bus 3030Uploaded byAdnan Shoaib
- Computations of Flows for on Demand Irrigation SystemsUploaded byrgscribd61
- Math Trial 2009 Terengganu p2Uploaded byxin yin
- Lecture 20Uploaded byLance
- randomvariables_16Uploaded bySaswata
- 01_prelim_ch02_Compatibility_Mode_.pdf;filename_= UTF-8''01%20prelim_ch02%20%5BCompatibility%20Mode%5DUploaded byRyle Arbon
- Dist_of_max_and_min_answers.pdfUploaded byd

- 1.28.pdfUploaded byJacqueline CC
- SSa mpledissertation.pdfUploaded byDilen Narainen
- Rothwell Towards the Fifth GenerationUploaded byLīga Vaļko
- Research DesignUploaded byMarijoy Marquez
- 1-General Method of Theory Building in Applied Disciplines (Lynham)Uploaded byColours Bynumbers
- The Role of Consciousness as Meaning Maker in Science Culture and ReligionUploaded byforizsl
- Notes on Modeling TheoryUploaded by240GL guy
- JRCS 2014 Picot-Coupey Burt Cliquet RetailFOMs&IZtheories-1Uploaded byOo Nam Oo
- The Theory of Multiple UniversesUploaded byTaskeen-Rubab
- stempaperUploaded byapi-345483624
- AnovaUploaded bykateborghi
- ProbabilityUploaded byReina
- ps8_sol (1)Uploaded byReywan Mayweather Jubelin
- 752-zweibelsonUploaded byAOXES
- Lecture 2Uploaded byAritra Sarkar
- 2. Scientific Propositions for Brain Reformation by Mailun Chan DingUploaded byiaset123
- Markov Chain Monte Carlo and Applied BayesianUploaded byRai Laksmi
- ch1Uploaded byCarmen Martinez
- Microgram Journal 2010-12-15 MDPVUploaded bycassparis
- 12CT3_QPUploaded bybantool
- Six Sigma for different culture - doctoral thesisUploaded bydk4r
- Chi Square (lab 7) (2)Uploaded bylily
- christian nichol - bird beak experiment - scientific reportUploaded byapi-407372049
- physical and chemical changesUploaded byapi-339892490
- Chapman & Hall_CRC Innovations in Software EnUploaded byDavid Taboada Cáceres
- Busness Statistics Syllabus Fall 2011Uploaded byNick Oliver
- Hill - Statistics Methods and ApplicationsUploaded byBalaji
- SP2.pdfUploaded byaridaconcept
- EC212 Course Outline 2015Uploaded byNibhrat Lohia
- Chinese Medicine- A Science in Its Own RightUploaded bydc6463