You are on page 1of 23

RELI ABI LI TY ENGI NEERI NG UNI T

ASST4403
Lec t ur e 12: DATA ANALYSI S
1
Learning outcomes
Present data visually and numerically, e.g. histogram
Identify distributions from data by means of e.g.
histogram g
Perform simple linear regression
2
How confident are we?
How much can we trust the results?
Howwell have we done? How well have we done?
Identifying candidate distributions
Estimating parameters
Confidence interval and
goodness-of-fit
de t y g ca ddate dst buto s
Estimating parameters goodness of fit
3
Histogram Histogram
4
Frequency distribution
Frequency distribution: data presented as class
intervals and their corresponding frequency intervals and their corresponding frequency
Range: the difference between the largest and the
smallest data al es smallest data values
Number of classes (bins): Sturges' rule: select a
bin size such that there are about 1 + log
2
n non-
empty bins (n is the number of data values)
Class midpoint: average of the class endpoints
Relative frequency: the ratio of the frequency of the Relative frequency: the ratio of the frequency of the
class interval to the total frequency
C l ti f i t t l f th l Cumulative frequency: running total of the classes
of frequency distribution
5
Example: 5 years house loan
interest rate interest rate
Lower End Upper End Frequency
6 5 0
14
16
6.5 0
6.5 6.6 8
6.6 6.7 0
6.7 6.8 1
6.8 6.9 0
8
10
12
q
u
e
n
c
y
6.9 7.0 0
7.0 7.1 15
7.1 7.2 0
7.2 7.3 14
7 3 7 4 0
4
6
F
r
e
q
7.3 7.4 0
7.4 7.5 0
7.5 7.6 3
7.6 7.7 0
7.7 7.8 3
0
2
6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8
Variable
7.8 7.9 0
7.9 8.0 0
8.0 8.1 9
8.1 8.2 0
8 2 8 3 3
n=60 data values, range = 2.3, class
width=0.1, number of classes (bins) = 25
8.2 8.3 3
8.3 8.4 0
8.4 8.5 0
8.5 8.6 2
8.6 8.7 1
(If we use Sturges rule, number of classes
(bins) =1+log
2
n= 1+log
2
60=7, then class
6
8.7 8.8 0
8.8 1
( ) g
2
g
2
7,
width should be 2.3/7=0.33)
Histogram
Graph of relative frequencies, representing the
underlying distribution (PDF).
Construct a histogram
Sort data in ascending order Sort data in ascending order
Find data range (max-min)
Decide on the number of intervals (bins) of equal ( ) q
size and bin size (trial and error, there is no best
number)
St ' l l t bi i h th t th Sturges' rule: select a bin size such that there are
about 1 + log
2
n non-empty bins (n is the number of
samples)
Group data into bins and count frequency
Reproduced with courtesy from J o Sikorska
7
Histogram Histogram
2
2
( )
2
1
( )

2
t
f t e

100 100
8
Reproduced with courtesy from J o Sikorska
Histogram Histogram

( )
t
f t e


( ) f t e
05 0.5
9
Reproduced with courtesy from J o Sikorska
Example: proper histogram classes
The following 35 failure times (in operating hours)
bt i d f fi ld d t 6 th were obtained from field data over a 6-month
period. Construct a histogram and discuss the
underlying distribution underlying distribution
20 31 36 47 98 157 182 185 210 210 20 31 36 47 98 157 182 185 210 210
214 221 246 247 279 284 289 300 400 401
428 438 442 467 499 552 553 597 767 796
1024 1297 1476 1563 2025
10
3 classes (too few)
Lower End Upper End Frequency
999 30
1000 1999 4
2000 1
30
35
15
20
25
F
r
e
q
u
e
n
c
y
0
5
10
999 1999
11
999 1999
Variable
17 classes (too many) 7 ( y)
Lower End Upper End Frequency
20 1
20 138 4 20 138 4
138 256 9
256 374 4
374 492 6
9
10
492 610 4
610 728 0
728 846 2
846 964 0
4
5
6
7
8
r
e
q
u
e
n
c
y
964 1082 1
1082 1200 0
1200 1318 1
1318 1436 0
0
1
2
3
4
F
r
1318 1436 0
1436 1554 1
1554 1672 1
1672 1790 0
20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790
Variable
1790 1
12
6 classes (proper)
Lower End Upper End Frequency
399 18
An exponential distribution?
400 799 12
800 1199 1
1200 1599 3
1600 1999 0 1600 1999 0
2000 1
10
15
20
q
u
e
n
c
y
0
5
399 799 1199 1599 1999
F
r
e
Variable
n=35 data values, range = 2005, class
width=400, number of classes (bins) = 6
(Using Sturges rule, number of classes
(bi ) 1+l 1+l 35 6 th l
13
(bins) =1+log
2
n= 1+log
2
35=6, then class
width should be 2005/6=334)
Example: original data for a histogram
14
Example: class interval and frequency for a
histogram g
15
E l l i t l d l ti Example: class interval and relative
frequency for a histogram
16
Example: histogram Example: histogram
A normal distribution?
17
Simple regression
Process of constructing a mathematical model of
f ti t di t/d t i i bl b function to predict/determine one variable by
another
Simple regression = linear regression, two variables
Dependent variable =the variable to be predicted, y Dependent variable the variable to be predicted, y
Independent variable (explanatory variable)
=predictor x =predictor, x
How well does it fit? Find the coefficient of
l ti ( l t 1 ibl ) correlation r (as close to 1 as possible)
18
Determining the equation of the regression
line
m = slope of the line
b = y intercept of the line
We are trying to determine these two to formthe We are trying to determine these two to form the
model
bb mx y
tg =m
b
19
Example Example
20
http://phoenix.phys.clemson.edu/tutorials/excel/regression.html
How to calculate/find m and b How to calculate/find m and b
n = number of data points
r is the correlation coefficient r is the correlation coefficient
21
Doing linear regression using g g g
EXCEL
22
http://phoenix.phys.clemson.edu/tutorials/excel/regression.html
Example
Individual Annual income ($000) Weekly time on National Direct Calls (minutes)
1 23 69
2 29 95
3 29 102
4 35 118
5 42 126
6 46 125
7 50 138
8 54 178
9 64 156 9 64 156
10 66 184
11 76 176
12 78 225
Slope 2.231503994 SLOPE(C2:C13,B2:B13) Slope 2.231503994 SLOPE(C2:C13,B2:B13)
Intercept 30.91246961 INTERCEPT(C2:C13,B2:B13)
r 0.941506251 CORREL(C2:C13,B2:B13)
250
150
200
50
i
o
n
a
l

d
i
r
e
c
t

c
a
l
l
s

n
)
50
100
e
e
k
l
y

t
i
m
e

o
n

n
a
t
(
m
i
n
23
0
0 20 40 60 80 100
W
e
Annualincome ($000)

You might also like