Professional Documents
Culture Documents
Statistics
@
@
@
R
@
Descriptive Inferential
Data
@
@
@
R
@
Categorical Numerical
@ @
@ @
@ @
R
@ R
@
Nominal Ordinal Discrete Continuous
Free answer
Combination question
Example: what is your normal mode of transport when coming to Brunel university
Bus, Train, Car, Walking, Other (please specify)
The information collected in this evaluation will be kept strictly confidential and no information will be passed to any Schools or course leaders.
About you
Name:
Student Number:
Level: Foundation L1 L2 L3 PG
Please state/describe the maths problem you would like help with
Feed back about us
How useful did you find the advice/support given: (please circle one)
The information collected in this evaluation will be kept strictly confidential and no information will be passed to any Schools or course leaders.
About you
Name:
0 1
Gender: Male Female
Student Number:
Foundation L2
Level: 1 L1 2 L3 4 PG 5
3
Please state your
course (e.g.
economics)
Please state/describe the maths problem you would like help with
Feed back about us
How useful did you find the advice/support given: (please circle one)
-2 -1 0 1 2
How could the café be improved?
frequency
Relative frequency =
total number of observation in the data set
3 2
4 1
8
Code Reason for leaving the University Frequency Relative freq. 5 7
1 Academic problems 7 0.167
6
2 Poor advising or teaching 3 0.071
3 Needed a break 2 0.048
4 Economic reasons 11 0.262
5 Family responsibilities 4 0.095
6 To attend another school 9 0.214
7 Personal problems 3 0.071
8 Other 3 0.071
2, 3, 5, || 6, 7, 9
5+6
The median (middle score) is =5.5.
2
Student 1 2 3 4 5 6 7 8 9
Score 94 81 56 90 70 65 90 90 30
Ordered Score 30 56 65 70 81 90 90 90 94
10
25% Whisker
upper quartile
25%
25 % median
lower quartile
25% Whisker
2 least value
Data set
{2, 2, 5, 6, 3, 3, 7, 4, 7, 5, 2, 2, 2, 4, 3, 5, 9}
Ordered data set
{2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 7, 7, 9}
Mode is 2
Median is 4
Statistics revision – p. 25/8
Pie Chart
2
3 family size = 2
family size = 3
family size = 4
Out[18]= family size = 5
9 4 family size = 6
family size = 7
7 family size = 9
5
6
4
family size = 2
family size = 3
3
family size = 4
Out[32]= family size = 5
2 family size = 6
family size = 7
1 family size = 9
2 3 4 5 6 7 9
3 3 2 5 6 2 2 7 2 1 4 5 5 6 6 1 4 2 4 2
1 3 3 6 4 6 2 0 4 4 6 1 3 4 2 2 4 4 2 1
0 3 3 6 6 1 1 0 2 1 5 9 3 3 6 6 8 5 4 4
2 1 3 3 2 4 5 4 3 3 5 4 2 3 6 4 4 7 7 4
4 1 2 7 2 0 5 2 0 2 8 4 3 4 2 1 3 2 2 3
4 2 2 4 6 2 0 4 3 2 2 3 3 5 2 4 6 1 0 4
3 4 4 2 5 2 3 3 6 1 3 4 2 6 2 2 5 1 7 3
5 0 6 7 2 2 2 4 3 0 4 2 3 6 2 4 2 0 1 2
2 6 1 4 3 6 2 5 1 3 1 0 4 3 2 4 1 4 8 1
7 4 5 4 4 4 4 7 1 5 3 1 0 2 3 1 2 4 1 3
30
20
10
0 1 2 3 4 5 6 7 8 9
Whisker
6
median =3
Whisker
Least value 0
0
30
24
19
20
15
12
10 8
3
1
2 4 6 8 10
Intervals: 0 6 x < 1, 1 6 x < 2, 2 6 x < 3, 3 6 x < 4, 4 6 x < 5, ...
Class mid-points: 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, ...
0.15
0.12
0.095
0.10
0.075
0.06
0.05 0.04
0.015
0.005
2 4 6 8 10
Number of
accident claims Frequency Relative freq.
x f fx
0 12 0.060 0
1 24 0.120 24
2 44 0.220 88
3 33 0.165 99 653
x̄ = 200
≈ 3.27
4 41 0.205 164
5 15 0.075 75
6 19 0.095 114
7 8 0.040 56
8 3 0.015 24
9 1 0.005 9
Total 200 1 653
Statistics revision – p. 38/8
Sample Variance
We can measure dispersion relative to the
scatter of the values about their mean.
For data {x1 , x2 , x3 , ...xn }
2
P
x
i i
Sample variance, σ2 = − (x̄)2
n
For frequency distribution
x x1 x2 x3 ... xi ... xn
freq f1 f2 f3 ... fi ... fn
2
P
2 f x
i i i
Sample variance, σ = P − (x̄)2
i fi
55.80 60.90 37.00 91.30 65.80 42.30 33.80 60.60 76.00 69.00
45.90 39.10 35.50 56.00 44.60 71.70 61.20 61.50 47.20 74.50
83.20 40.00 31.70 36.70 62.30 47.30 94.60 56.30 30.00 68.20
75.30 71.40 65.20 52.60 58.20 48.00 61.80 78.80 39.80 65.00
60.70 77.10 59.10 49.50 69.30 69.80 64.90 27.10 66.30 87.10
16
15
10
8 8
7
6
2 2
1
40 60 80 100
6
5
4
4
3 3
2 2
2
30 40 50 60 70 80 90
3.27792 3.37444 4.97057 4.02437 4.40855 4.69663 3.34397 5.22305 3.55060 2.98057
5.81152 4.58240 5.08875 4.04497 3.87288 4.67210 4.90091 4.31757 5.20679 3.25989
3.90416 5.37304 4.64384 4.38037 3.94797 2.76160 6.02717 5.29289 2.84805 4.780400
4.11426 3.73694 5.20243 1.79561 3.71626 3.24735 5.51044 3.26583 4.46252 5.460610
5.48467 3.60436 2.98056 5.53549 3.89788 4.14706 2.96069 5.37283 5.05862 3.67263
3.25160 6.63551 3.18142 5.22402 3.37358 3.15472 3.21479 3.44678 4.93306 4.31728
4.14319 1.77422 4.25183 2.84643 4.89365 3.56778 3.23527 6.17919 4.35063 5.11706
4.85987 4.20730 2.88155 5.59583 3.94908 4.02062 5.03695 4.35373 5.44498 4.20769
3.53962 5.20128 5.23739 4.37652 3.65423 3.42377 4.31031 5.73569 4.61766 3.85986
5.74499 3.64311 2.21657 3.69019 5.70689 4.24800 4.63107 4.74557 3.68453 5.15948
20 19
17 17
15 14
13
10
7 7
5
2 2
1 1
2 3 4 5 6 7
0.4 0.38
0.34 0.34
0.3 0.28
0.26
0.2
0.14 0.14
0.1
0.04 0.04
0.02 0.02
2 3 4 5 6 7
425.5
x̄ = = 4.255
r100
1911.25
σ = − (4.255)2 = 1.00373
100
Statistics revision – p. 55/8
Histogram and N(4.26, 1.004)
0.4
0.3
0.2
0.1
2 3 4 5 6 7
Normal distribution: mean, median and mode are identi-
cal in value.
x̄(sample) ⇒ µ(population)
s(sample) ⇒ σ(population) Statistics revision – p. 57/8
Point and interval estimators
Estimator
@
@
@
R
@
Point estimator Interval estimator
(one number) (two numbers)
A statistic θ̂ is an unbiased estimator of the
parameter θ if the expected value of an estimator
equals to the parameter which it is supposed to
estimate
E[θ̂] = θ
s s
x̄ − tα/2,n−1 √ < µ < x̄ + tα/2,n−1 √
n n
Solution:
Substituting x̄ = 83.2, s = 7.3 and t0.025,19 = 2.093 (from table
for t-distribution), the 95% confidence interval for µ becomes
7.3 7.3
83.2 − 2.093 √ < µ < 83.2 + 2.093 √
20 20
or simply 79.8 < µ < 86.6
In (1) we are performing a two-tailed test; in (2) and (3) we are performing a
one-tailed test.
Statistical hypothesis
@
@
@
R
@
Simple hypothesis Composite hypothesis
Example: Example:
(for normal distribution) (for normal distribution)
If σ is known If σ is unknown
H0 : x̄ = 3 H0 : x̄ = 3
σ=A
Sxy
r= p ,
Sxx Sxy
X X X
Sxy = n xy − x y,
X X 2 X X 2
2 2
Sxx =n x − x , Syy = n y − y
8 8
6 6
Y Y
4 4
2 2
0 1 2 3 4 5 0 2 4 6 8 10
X X
8 8
6 6
Y Y
4 4
2 2
0 2 4 6 8 10 2 4 6 8 10
X X
y = a + bx
P P
Sxy i yi − b i xi
b= a = ȳ − bx̄ =
Sxx n
0.4
0.3
0.2
0.1
3 4 5 6 7
pH
These results suggest a linear relationship. Statistics revision – p. 76/8
Solution of example 9
• Decision rule.
If we let α = 0.025, 2α = 0.05, the critical values of t
in the present example are ±2.365
(e.g. see John Murdoch, "Statistical tables for students of science, engineering,
psychology, business, management, finance", 1998, Macmillan, 79 p., Table 7).
Step 6.
Now we use regression analysis to find the line of
best fit to the data.
0.5
Optical density
0.4
0.3
0.2
0.1
2 3 4 5 6 7 8
pH
r = 0.989
y = 0.096x − 0.184
Statistics revision – p. 83/8
Chi-Square Goodness-of-Fit Test
Question: Can we assume that the distribution of a
sample is valid for the whole population?
The Pearson’s chi-square test (χ2 -test) is used to
test if a sample of data came from a population with a
specific distribution.
Advantage : Can be used for discrete distributions
such as the binomial and the Poisson and continuous
distributions such as normal distribution.
Disadvantage:
the value of χ2 -test statistic are dependent on
how the data is binned.
χ2 -test requires a sufficient sample size in order for
χ2 approximation to be valid.
Statistics revision – p. 84/8
Chi-Square Goodness-of-Fit Test
For the χ2 goodness-of-fit computation, the data are di-
vided into k bins and the test statistic is defined as
X (observed − expected)2
χ2 =
expected
2 2
χ > χα,n−c
λk e−λ
pk = k = 0, 1, 2, ...
k!
where
k is the number of occurrences of an event.
λ is a positive real number, equal to the expected
number of occurrences that occur during the given
interval.
Statistics revision – p. 88/8
Example 10