CL2603: Engineering Computing and Statistics: DR B M Parker

CL2603: Engineering Computing and Statistics
Dr B M Parker
Brunel University London
Term 2 2021/2022
Dr B M Parker (Brunel) CL2603 1

Table of Contents
Logistics
Assessment
Resources

Main Aims
To develop knowledge and skills of applied mathematics and statistical
concepts useful to solve engineering problems. This includes the ability to
develop simple mathematical models that represent experimental data set,
estimate relevant model parameters and assess model performance as well
as solving differential equations describing chemical engineering
phenomena.
This course will be split into two sections

Statistics : Dr Ben Parker
Numerical Methods : Dr Mike Warby

Learning Outcomes
This Modular Block provides opportunities for students to demonstrate

knowledge and understanding (K) cognitive (thinking) skills (C) and other
skills and attributes (S) in the following areas:
Demonstrate knowledge and critical understanding of the analysis,
manipulation and interpretation of data using statistical techniques
Develop simple mathematical models to represent experimental data
set and estimate relevant model parameters
Identify suitable regression models and assess their performance in
explaining and predicting experimental data
Use software to perform statistical analysis

Recommended Reading
Recommended textbooks for the course are available via the Talis Reading
list linked from blackboard.

What is covered/ overview plan
Part 1 (Week 18) Review of Basic Statistical Concepts, Introduction

to Statistical Computing in R, Sampling
Part 2 (Week 20) Estimation and Confidence Intervals
Part 3 (Week 22) Hypothesis Testing
Part 4 (Week 24) Review of Linear Regression and model fitting
Part 5 (Week 26) Extending Linear Regression (Multiple Regression
and model building)
Part 6 (Week 28) Non-linear Regression

Logistics
Some uncertainty at time of writing whether we will be allowed full
lectures on Campus. I hope to move to a more traditional lecture if
permitted. In the meantime
The course will have online asynchronous (i.e. recorded) lectures.
There is a slot in your timetable from 10-12 on Fridays, which you
might want to use to review the lecture material, but lectures and
slides will be made available earlier each week so that you can study
when you want.
I may occasionally ask you to do some extra reading or look at some
examples to supplement this lecture material.
From Friday 2pm-4pm we have a lab, where we will be working
through examples and using the R statistical programming language.
I will set some take home exercises each week to be done before the
next session (two weeks later).
It is essential you review the lectures before the labs, and attend the
lab. I will not be recording labs.
It is best you attend the lab in person. If you cannot attend in
How this course will be taught and assessed
Each week I will provide some questions to allow you to practice the
material in the course; solutions will be given in lectures where there is
time and/or on the course blackboard.
It is important you devote some time to work through these examples; the
only way to improve at mathematics is to practice it. However, this work
is not assessed, although I am happy to talk about individual answers in
office hours.

Assessment
The assessment of this course is as follows:

50% of the course will be assessed in a lab based assignment on
statistical analysis.
50% of the course will be assessed in a written exam.

Resources
All documents for the course will be posted on blackboard,

http://blackboard.brunel.ac.uk/
This includes these slides, exercises, solutions, assignments, computer
lab exercises, and everything else that may be relevant.
This course assumes some knowledge of mathematics and statistics
from your previous courses; you may wish to review the notes for
these courses.
I will suggest books appropriate for each chapter as we proceed. A
link to these books can be found on blackboard.
I will have office hours every week (TBD)
You should also e-mail me (Ben.Parker@brunel.ac.uk) if you need
help, or to arrange a meeting outside of an office hour.
Any corrections, questions, or feedback are also very welcome!

Part I
Statistical Concepts, Computing, and

Sampling

Learning Outcomes
By the end of the first week, you will be able to:

interpret a written description of an experiment as appropriate
mathematical notation, and vice versa.
recognise situations where random variables can be represented by
known distributions (discrete and continuous Uniform, Binomial,
Geometric, Poisson, Exponential and Normal) and interpret
parameters of these distributions correctly.
calculate expectations and variance of random variables.
use computational software to find probabilities of events.

Motivation
Let’s suppose we have some

hypothesis or belief about a chemical
process. We summarise this in some
model.
We gather some data. We can use
this data to
test whether the model is true
find out something about the
parameters of the model
make some prediction using the
model
From Santiago, Celine B., Jing-Yao Guo, and Matthew S. Sigman. ”Predictive and mechanistic
multivariate linear regression models for reaction development.” Chemical science 9.9 (2018):
Dr B M Parker (Brunel)
2398-2412. CL2603 12
Further motivation
Our data may come from physical experiments which can often be
expensive, difficult, or dangerous...
... so we will often use numerical experiments or simulation (which Dr
Shaw will talk about next week)
However we generate the data it’s very
unlikely that we can see every possible
chemical reaction that may occur, so we
assume that we see some randomly
chosen sample of all the data we could
have seen. We form our view about the
world based on this data, and in this
section of the course we’ll quantify how
much we can say about the world based
on this data.

Terminology
We call the set of all possible experiment results we could have seen
the Sample Space
We call the set of results we did actually see the data.
Example: Accelerator mass spectrometry detector measures the number of
Carbon-14 ions in five sample ice cores chosen at random locations in
Antarctica.
The sample space is Ω = {0, 1, 2, 3, ...}
Our data might be 1, 2, 12, 0, 5
Based on our data, is there anything we can say about the amount of
Carbon-14 in Antarctica?

Descriptive Statistics
It is of course interesting to provide some descriptive statistics from a
sample. In this course, I’ll demonstrate how to do these things in R which
we’ll meet in the labs.
Let’s looking at some real data; the file concrete.dat contains the
compression strength (Nmm−2 ) of 180 cubes of concrete made by our
chemical process.
The first thing we might do with any data set is examine it; we can do this
in R as follows:
>concrete <- read.table(‘‘concrete.dat’’)
This gives us an object with two columns, the experiment number and the
strength reading. To put this in a form we can work with, we do
strength<-concrete$V1
to get the data column into the strength object. We can display the data
by typing strength.
To get a preliminary feel about the data, we get a histogram by typing
hist(strength).
We notice from this histogram that the graph is largely symmetrical,
although there are a few unusual low values which we might question; with
these values, the graph is very slightly skewed to the left.

We may also wish to summarise the data numerically; let’s call our data
y1 , y2 , y3 , . . . , yn . Here we have y1 = 57.4, y2 = 59.6, . . . , y180 = 64.7
We can find the mean as
Pn
yi 57.4 + 59.6 + . . . + 64.7
ȳ = i=1 = = 61.10.
n 180
In R, we can do this by mean(strength).

The median is the middle observation if we arrange the data in ascending
order; let y(1) be the smallest value, y(2) the second smallest, and suppose
we have 2m + 1 values. Then y(m) is the median value.
This is sometimes a more typical value, and used in different contexts from
the mean; one technical note is that if we have two middle values, i.e an
even number of data, then we take the median as the average of the
middle two values.
For the concrete, we can just use median(strength) to get 61.25; in this
example, the median is very close to the mean as the data are very
symmetrical.
The mode is sometimes used as the most frequently occurring value in
discrete data; for continuous data we might use the modal range; from the
histogram we can see this is 60-62 for the concrete example.

Other useful ways to describe the data are:
Maximum: The highest value (yn ).
Minimum: The lowest value (y1 ).
Range: The maximum minus the minimum
The Upper Quartile (UQ): The value which 75% of the data lies
below.
The Lower Quartile (LQ): The value which 75% of the data lies
above.
The interquartile range: UQ-LQ.
We can also use summary(strength) to get a selection of summary
statistics for the data.

We can see all of these graphically by using a boxplot; in R for the
concrete data we use boxplot(strength).
the top point is the maximum, the bottom the minimum; the upper and
lower quartiles are the edges of the rectangle, and the median the line in
the middle of the rectangle. Note that R decides that the the three lower
points are sufficiently different from the rest of the data to be outliers, and
draws these as circles.

Spread of the data
Statistics is all about quantifying uncertainty; we introduced the range and

the interquartile range, but more often we will look at the variance (or
equivalently the standard deviation) of a sample, which is defined as
Definition (Sample variance)

Pn Pn 2
2 − ȳ )2
i=1 (yi ( − n(ȳ )2
i=1 yi )
s = = .
n−1 n−1
s (the square root of the variance) is then the sample standard deviation.
Note this is defined a little differently for the standard deviation of a
population or distribution we defined in chapter 1.
Variance is just a way of quantifying the spread of the data. Smaller
variance mean we know more about where the data lies, higher variances
mean we know less.

Contents
Spread of the data
1 Random Variables
Selected Discrete distributions
Expectation and variance

Random Variables
Let’s suppose for each element in our sample space we assign a number;
usually this will be some meaningful statistic based on that outcome.
For example, let us suppose we toss a fair coin twice and record the
output. Our sample space Ω = {HH, HT , TH, TT }. We could invent a
random variable X, which might be the number of heads in two tosses so
X (HH) = 2
X (HT ) = X (TH) = 1
X (TT ) = 0
If our variable takes only discrete values (e.g 0,1,2,. . . ) then it is a discrete
random variable otherwise it is a continuous random variable.

Definition (Probability distribution function)
Let X be a discrete random variable. We let the probability distribution
function be
f (x) = P(X = x).
Example (Coins again)

We toss 2 coins and record the number of heads as before. We have
1
f (0) = P(X = 0) =
4
1
f (1) = P(X = 1) =
2
1
f (2) = P(X = 2) =
4
x 0 1 2
We will often write these as a table
P(X = x) 0.25 0.5 0.25

Definition (Cumulative distribution function)
Let X be a discrete random variable. We let the cumulative distribution
function be X
F (x) = P(X ≤ x) = f (u).
u≤x
Coins again
We can write out the cumulative distribution for the coin example as
follows
1 1 3
F (1) = P(X ≤ 1) = P(X = 0) + P(X = 1) = + =
4 2 4
F (1.5) = P(X ≤ 1.5) = F (1) here
F (−17) = P(X < −17) = 0
Note we use lower case f to refer to the probability distribution function,

and capitalized F to refer to the cumulative distribution function.
Common distributions
We will often make use of standard distributions whose properties we know

to model our processes.
It is often the case that if we approximate our process to a known
distribution, we can make our calculations a lot simpler (at the cost of
some accuracy).
For example, physical laws might tell us that radiation is a random
process, so we might model the number of radioactive particles as a
Poisson distribution which models rare events. Although it is unlikely the
process is exactly Poisson, we gain more than we lose.
For example, we might know the process is Poisson, and be required to
measure the mean radiation rate.
We’ll look at some common distributions in the next few slides.

Definition (The geometric distribution)
A number of trials take place sequentially, and each trial is recorded as a
success or a failure with equal probability.
The geometric distribution specifies the number of trials, X , before the
first success.
It has a parameter p which represents the probability of success for each
trial.
The random variable, X , takes values 0, 1, 2, . . . It has a PDF
P(X = x) = f (x) = (1 − p)x p
We can show the CDF is P(X ≤ x) = F (x) = 1 − (1 − p)x+1
We will sometimes write X ∼ Geo(p) to show that the random variable X
has a geometric distribution with probability of success parameter p.
N.B. Different authors might define the geometric distribution as the number of
successes up to and including the first success. Be careful- this is equivalent, but
slightly different mathematically to what I use here.

Example
An engineer has designed a storm water sewer system so that the yearly
maximum discharge will cause flooding on average once every 10 years.
This means that the probability each year that there will be a discharge
which causes flooding is 0.1. If it can be assumed that the maximum
discharges are independent from year to year, what is the probability that
there will be at least one flood in the next five years?
We can model this as a geometric distribution, with parameter p = 0.1.

We measure the number of years X before we have a ”successful“ flood.
f (0) = (1 − p)0 p = 0.1
f (1) = (1 − p)1 p = 0.9 × 0.1 = 0.09
f (2) = (1 − p)2 p = 0.9 × 0.9 × 0.1 = 0.081 etc...

To answer the question, the probability that there will be at least one
flood in the next five years is equal to the probability that we have at most
4 ”failures” before the first flood, so can be written as
P(X ≤ 4) = F (4)
= 1 − (1 − p)5 = 1 − 0.95 = 1 − 0.59 = 0.41

This calculation is easy if we realise that our model is a geometric
distribution, use the properties of that distribution with the appropriate
distribution, and we do not have to work the CDF out from scratch every
time.

Example (Poisson distribution)
λx e −λ
Poisson distribution: f (x) = x! . Used to model rare events which
occur independently: e.g.
The number of hurricanes per year in Texas.
The number of ions of a particular type in a small chemical sample.
The number of days per year that the atmosphere in a city is classed
as noxious.

For example, we might know that the number of days X when the water
quality at a beach reaches toxic levels is Poisson distributed with
parameter λ = 5.
We can calculate the probability that there are X days per year with toxic
water quality as follows:
λ0 e −λ 1 × e −5
P(X = 0) = = = e −5 = 0.007
0! 1
λ1 e −λ 5 × e −5
P(X = 1) = = = 5e −5 = 0.033
1! 1
...
λ4 e −λ 625 × e −5
P(X = 4) = = = 0.175
4! 24

Poisson Distribution, λ=5
0.15
0.10
f(x)
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Example (Binomial distribution)
f (x) = xn p x (1 − p)n−x

Binomial distribution:
(Remember that xn is the number of ways of choosing x things without

replacement from a set of n things, and xn = x!(n−x)!

n!

. )
Used to model the probability of x successes from a total number of n
trials with probability of each occurring independently of p. For example:
The number of times a biased coin with probability of heads p will
come up heads if tossed n times.
The number of times a year it will rain if it rains independently each
day with probability p and n = 365.
The number of ions of samples that contain a Carbon-14 ion if we
have n samples, each containing the ion with probability p
independently.
For example, we have 20 high specification parts for a centrifuge. What is

the distribution of the number of faulty parts if the probability of a faulty
part is 0.3,0.5, or 0.7 respectively?
Binomial Distribution, p=0.3,n=20 Binomial Distribution, p=0.5,n=20
0.15
0.15
0.10
0.10
f(x)
f(x)
0.05
0.05
0.00
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X X
Binomial Distribution, p=0.7,n=20

0.15
0.10
f(x)
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Some examples of density functions
Distribution Defined on pdf f(x) Notes

1
Uniform(a,b) a,a+1,. . . ,b b−a+1 X is equally likely
to be a,a+1,. . . ,b
n
p x (1 − p)n−x

Binomial(n,p) 0,1,. . . ,n x X is the number
of successes from
n trials each with
probability p.
Geometric(p) 0,1. . . (1 − p)x p Number of failures
before first success
λx e −λ
Poisson(λ) 0,. . . x! “Rare” events.

Definition (Cumulative Distribution Function/ Probability Density
Function)
For continuous random variables, we can define the cumulative distribution
function Z x
F (x) = P(X ≤ x) = f (u)du.
−∞
The probability density function (pdf) can then be found as the derivative
of the cumulative distribution function,
d
f (x) = F (x)
dx
For the pdf to be valid, we must have that
f (x) ≥ 0 for all x.
R∞
−∞ f (u)du = 1.

Differences between CDF and PDF
PDF:
Value greater than or equal to
zero.
Must integrate to 1.
To go from PDF→ CDF we
integrate under the curve.
CDF:
Function ranges from zero to 1
(F (−∞) = 0, F (∞) = 1)
Never decreases.
To go from CDF → PDF we
differentiate the function.

Examples of continuous distributions
Example (Example 1: The Normal Distribution)

The Normal distribution (commonly called the Gaussian) with mean
parameter µ and variance parameter σ 2 has a probability density function
given by
1 (x−µ)2
f (x) = √ e 2σ2
2πσ 2
It is used to model many things in the sciences, and we shall see its use
later in the course for statistical inference.
Errors in experiments or measurements
Processes that move at random, such as the (logarithm of) changes
in prices of stocks, or changes in the high tide level from day to day.
“White noise” in electrical engineering.

We can see how the PDF and CDF vary with the parameters for the
normal distributions:
Figure: CDF
Dr B M Parker(left) and PDF(right) for CL2603
(Brunel) normal distribution for varying39values of
The Standard Normal Distribution
The normal distribution with mean 0 and standard deviation 1, N(0, 1), is
called the standard normal distribution.
For the standard normal distribution, tables are available in all published
books of statistical tables (For example, table 4 of ‘New Cambridge
Statistical Tables’, 2nd Edition, by D. V. Lindley and W. F. Scott.) giving
the probability of the distribution in selected regions.
Most tables give areas under the curve to the left of a specified value,
i.e. the probability of observing a standard normal value less than or equal
to a specified value, P(Z ≤ z).

Usually, tables only give P(Z ≤ z) for positive values of z. For negative
values, we use the symmetry of the distribution to calculate the required
probability.
0.4 0.4
0.3 0.3
0.2
= 0.2
0.1 0.1
0.0 0.0
−z z
−4 −2 0 2 4 −4 −2 0 2 4
0.4
0.3
= 1 - 0.2
0.1
0.0
z
−4 −2 0 2 4
P(Z ≤ −z) = 1 - P(Z ≤ z)

So therefore the probability of an observation of a standard normal
population being less than −1.74 is 0.0409.

We can now calculate probabilities for any region.
0.4 0.4 0.4
0.3 0.3 0.3
0.2
= 0.2
- 0.2
0.1 0.1 0.1
0.0 0.0 0.0

a b b a
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
P(z1 ≤ Z ≤ z2 ) = P(Z ≤ z2 ) - P(Z ≤ z1 )

So therefore the probability of an observation of a standard normal
population being between −0.04 and 1.74 is 0.9591-(1-0.5160)=0.4751.

Standardising a Normally Distributed Variable
The normal distribution has a particularly convenient property.

Consider a variable whose probability distribution has mean µ and
standard deviation σ. Suppose that we subtract µ from this variable and
then divide by σ, to obtain a transformed variable.
The transformed variable has mean 0 and standard deviation 1.
Furthermore, if the distribution of the original variable is normal, the
transformed variable has a standard normal distribution.
The operation of subtracting the mean (µ) of the distribution and dividing
by the standard deviation (σ) is called standardising the variable, and we
write
X −µ
Z= .
σ
By standardising, we can calculate probabilities for any normal distribution
using tables of the standard normal distribution.

Example (SO2 )
Suppose that the atmospheric SO2 (sulphur dioxide) concentration at a
particular location is normally distributed with mean 25.8 µgm−3 and
standard deviation 5.5 µgm−3 . What is the probability of a SO2
concentration between 20 and 30 µgm−3 ?
If we denote the SO2 concentration by X then Z = (X − 25.8)/5.5 is a

variable with a standard normal distribution.
We require P(20 ≤ X ≤ 30).
When x = 20, z = −1.05. When x = 30, z = 0.76
P(20 ≤ X ≤ 30) = P(−1.05 ≤ Z ≤ 0.76) = 0.7764 − (1 − 0.8531) =
0.6295.

0.06
0.04
0.02
0.0
10 20 30 40
=
0.4
0.3
0.2
0.1
0.0
−4 −2 0 2 4

The Exponential distribution
The exponential distribution has probability density function

f (x) = λe −λx . It takes one parameter, λ It is commonly used to measure
the lifetime of components, or times between failures.
Example
A turbine blade has a lifetime exponentially distributed with λ = 0.5.
What is the probability the turbine lasts more than 5 years?
We know the pdf of the distribution is f (x) = λe −λx = 0.5e −0.5x .

The probability
R5 that the turbine lasts less than 5 years is
F (5) = 0 f (x)dx.
Thus the probability that the turbine lasts more than 5 years is 1 − F (5).
This can be calculated directly (exercise) and found to be 0.082..

Figure: CDF (left) and PDF(right) for exponential distribution for varying values
of parameter λ.

Selected continuous distributions
Distribution Defined on pdf f(x) Notes

1
Uniform(a,b) [a,b] b−a
Exponential(λ) x ≥0 λe −λx Time between failures

(x−µ)2
Normal(µ, σ 2 ) −∞ ≤ x ≤ ∞ √ 1 e 2σ 2 Commonly called the
2πσ 2
Gaussian in Engineer-
ing

Expectation of a random variable
Definition
The expectation of a discrete random variable is
X
E (X ) = xf (x)
x∈Ω
and of a continuous random variable is

Z ∞
E (X ) = xf (x)dx.
−∞

Example (Fair Die)
What is the expectation of the number shown, X, when rolling a fair die?
The number displayed on the die, X, has a discrete uniform distribution

between 1 and 6 (i.e. P(X = x) = 16 for x = 1, 2, 3, 4, 5, or 6.
Thus the expectation is
x=6
X X x
E (X ) = xf (x) =
6
x∈Ω x=1
1+2+3+4+5+6 21 1
= = =3
6 6 2

Example (Lifetime of a turbine blade)
The lifetime of a turbine blade in days is distributed according to the
following PDF. (
20000
x3
, x > 100.
f (x) =
0, Otherwise.
What is it’s expected lifetime?

Expectations of functions of variables
Note that if X is a random variable, if we take some function g(X) then

Y=g(X) is a random variable also and
X
E [g (x)] = g (x)f (x)dx, or
Z ∞
E [g (x)] = g (x)f (x)dx
−∞
for discrete and continuous random variables respectively.
It follows that, for random variables X and Y, and for constant c that
E (cX ) = cE (X )
E (X + Y ) = E (X ) + E (Y )
Moreover, if X and Y are independent, then E (XY ) = E (X )E (Y ). Note
that the converse is not true.

Example (Resistors)
A random current I flows through a resistor with R = 50Ω. The
probability density function for the current is given as

2kx,
 0 ≤ x < 0.5.
f (x) = 2k(1 − x), 0.5 ≤ x ≤ 1

0, otherwise

What is the expected value of the voltage across the resistor?

Variance
The variance of a random variable is a way of quantifying how much a
distribution varies about its expected value.
Definition (Variance and standard deviation)
The variance of a random variable X is defined as
Var(X ) = E [(X − µ)2 ].
The standard deviation of X is the (positive) square root of the variance.
For discrete random variable X, we can therefore write the variance of X as

X
σX2 = E [(X − µ)2 ] = (x − µ)2 f (x)
and for a continuous random variable as

Z ∞
σX2 = E [(X − µ)2 ] = (x − µ)2 f (x)dx.
−∞

As a consequence of our definitions, we can find the variances of
combinations of random variables
Var(X + Y ) = Var(X ) + Var(Y )
Var(X − Y ) = Var(X ) + Var(Y )

Var(cX ) = c 2 Var(X )
Here X and Y are general random variables, and c is a constant.
Example (Hint:)
It’s often easier to know that the variance can be defined by
Var(X ) = E (X 2 ) − [E (X )]2

Example (Fair die again)
What is the variance of the number displayed on a fair die?
Example (Lifetime of a turbine blade)

The lifetime of a turbine blade in days is distributed according to the
following PDF. (
20000
x3
, x > 100.
f (x) =
0, Otherwise.
What is the variance of its lifetime?

We can calculate the mean and variance of some selected distributions as
follows:
Table: Selected Discrete distributions

Distribution Mean Variance
a+b (b−a+1)2 −1
Uniform(a,b) 2 12
Binomial(n,p) np np(1 − p)
1−p 1−p
Geometric(p) p p2
Poisson(λ) λ λ

Table: Selected continuous distributions
Distribution Mean Variance
a+b (b−a)2
Uniform(a,b) 2 12
Exponential(λ) λ−1 λ−2

Normal(µ, σ 2 ) µ σ2

CL2603: Engineering Computing and Statistics: DR B M Parker

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CL2603: Engineering Computing and Statistics: DR B M Parker

Uploaded by

Copyright:

Available Formats

CL2603: Engineering Computing and Statistics

Brunel University London

Dr B M Parker (Brunel) CL2603 1

Dr B M Parker (Brunel) CL2603 1

This course will be split into two sections

Dr B M Parker (Brunel) CL2603 2

This Modular Block provides opportunities for students to demonstrate

Dr B M Parker (Brunel) CL2603 3

Dr B M Parker (Brunel) CL2603 4

Part 1 (Week 18) Review of Basic Statistical Concepts, Introduction

Dr B M Parker (Brunel) CL2603 5

Dr B M Parker (Brunel) CL2603 7

The assessment of this course is as follows:

Dr B M Parker (Brunel) CL2603 8

All documents for the course will be posted on blackboard,

Dr B M Parker (Brunel) CL2603 9

Statistical Concepts, Computing, and

Dr B M Parker (Brunel) CL2603 10

By the end of the first week, you will be able to:

Dr B M Parker (Brunel) CL2603 11

Let’s suppose we have some

Dr B M Parker (Brunel) CL2603 13

Dr B M Parker (Brunel) CL2603 14

Dr B M Parker (Brunel) CL2603 16

In R, we can do this by mean(strength).

Dr B M Parker (Brunel) CL2603 17

Dr B M Parker (Brunel) CL2603 18

Dr B M Parker (Brunel) CL2603 19

Dr B M Parker (Brunel) CL2603 20

Statistics is all about quantifying uncertainty; we introduced the range and

Definition (Sample variance)

Dr B M Parker (Brunel) CL2603 21

Spread of the data

Dr B M Parker (Brunel) CL2603 22

Dr B M Parker (Brunel) CL2603 23

Example (Coins again)

Dr B M Parker (Brunel) CL2603 24

Note we use lower case f to refer to the probability distribution function,

We will often make use of standard distributions whose properties we know

Dr B M Parker (Brunel) CL2603 26

Dr B M Parker (Brunel) CL2603 27

We can model this as a geometric distribution, with parameter p = 0.1.

Dr B M Parker (Brunel) CL2603 28

= 1 − (1 − p)5 = 1 − 0.95 = 1 − 0.59 = 0.41

Dr B M Parker (Brunel) CL2603 29

Dr B M Parker (Brunel) CL2603 30

Dr B M Parker (Brunel) CL2603 31

Dr B M Parker (Brunel) CL2603 32

replacement from a set of n things, and xn = x!(n−x)!

For example, we have 20 high specification parts for a centrifuge. What is

Binomial Distribution, p=0.7,n=20

Dr B M Parker (Brunel) CL2603 34

Distribution Defined on pdf f(x) Notes

Dr B M Parker (Brunel) CL2603 35

Dr B M Parker (Brunel) CL2603 36

Dr B M Parker (Brunel) CL2603 37

Example (Example 1: The Normal Distribution)

Dr B M Parker (Brunel) CL2603 38

Dr B M Parker (Brunel) CL2603 40

P(Z ≤ −z) = 1 - P(Z ≤ z)

Dr B M Parker (Brunel) CL2603 42

0.3 0.3 0.3

0.1 0.1 0.1