Professional Documents
Culture Documents
1 PDF
1 PDF
Course On
STRUCTURAL
RELIABILITY
Module # 02
Lecture 1
Instructor:
Dr. Arunasis Chakraborty
Department of Civil Engineering
Indian Institute of Technology Guwahati
1. Lecture 01: Basic Statistics
Observation data samples are presented in form of scattered points which can be independent or
dependent of any other random variable. Presentation of the sample data is vitally important as it
gives crucial knowledge about its constitutive statistical properties such as correlation, range etc.
Generally, a statistical observation sample is represented in scatter diagrams, histograms and
frequency polygons. The random variables associated with these observations are discrete.
Histograms are representation of grouped frequency distribution of observation data. These are
bar like representation of observation data where width of bar is class interval of the data and
amplitude or height of bar refers to frequency density of data falling under its associated class
(see Figure 2.1.2). The area of each bar represents its class frequency, this is expressed in Eq.
2.1.1.
𝑦 𝑦 𝑦
𝑥 𝑥 𝑥
(𝑎) (𝑏) (𝑐)
𝑦 𝑦 𝑦
Figure 2.1.1 Scatter diagrams showing different types and degrees of correlation (𝒂)
positive, 𝒃 negative, (𝒄) zero, 𝒅 zero, 𝒆 +𝟏 and 𝒇 −𝟏
Histogram
Frequency
Frequency Polygon
Values
An alternative to histograms is frequency polygon which formed by joining the mid values of
each class as shown in Figure 2.1.2. If the width of class are same than the area under histograms
is same as under the frequency polygon. The curve formed by frequency polygon gives an idea
of frequency distribution of the data.
A whole set of observations can be described by a single value. It usually occupies a central
position such that some observations are larger and some others are smaller than itself, these are
known as measures of central tendencies. There are 3 measures of central tendency – mean,
median and mode.
Mean – It is of 3 types – arithmetic mean, geometric mean and harmonic mean. The words
'mean' and 'average' only refer to arithmetic mean. In this course only arithmetic mean is
discussed.
Arithmetic Mean (AM) – It is defined as sum of a set of observations divided by size of the set.
Consider observations 𝑥1 , 𝑥2 , … , 𝑥𝑛 where 𝑛 is number of observations, their AM (𝜇𝑥 ) is
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 1
𝜇𝑥 = = ∑𝑥 2.1.3
𝑛 𝑛
𝑥1 + 𝑥1 + ⋯ + 𝑥1 + 𝑥2 + 𝑥2 + ⋯ + 𝑥2 + ⋯ + 𝑥𝑛 + 𝑥𝑛 + ⋯ + 𝑥𝑛
𝑓1 𝑡𝑒𝑟𝑚𝑠 𝑓2 𝑡𝑒𝑟𝑚𝑠 𝑓𝑛 𝑡𝑒𝑟𝑚𝑠
= 𝑓1 𝑥1 + 𝑓2 𝑥2 + ⋯ + 𝑓𝑛 𝑥𝑛 2.1.4
𝑓1 𝑥1 + 𝑓2 𝑥2 + ⋯ + 𝑓𝑛 𝑥𝑛 ∑𝑓𝑥
𝜇𝑥 = = 2.1.5
𝑓1 + 𝑓2 + ⋯ + 𝑓𝑛 ∑𝑓
Important properties of AM
where 𝑁 = ∑𝑓 is the total frequency. The first relation in Eq. 2.1.6 implies that the
simple sum whereas the second relation implies the weighted sum.
2. For given observations, the sum of deviations from their mean is always 0.
∑ 𝑥𝑖
∑ 𝑥𝑖 − 𝜇𝑥 = 0, where 𝜇𝑥 = , and
𝑛
2.1.7
∑ 𝑓𝑖 𝑥𝑖
∑ 𝑓𝑖 𝑥𝑖 − 𝜇𝑥 = 0, where 𝜇𝑥 = 𝑁
3. Two variables 𝑥 and 𝑦, related in such a way that 𝑦 = 𝑎𝑥 + 𝑏, where 𝑎 and 𝑏 are
constants, then
𝜇𝑦 = 𝑎𝜇𝑥 + 𝑏 2.1.8
and vice versa. Relation in Eq. 2.1.8 explains that if each of the observations 𝑥𝑖 is added,
subtracted, multiplied or divided by a constant than the mean 𝜇𝑥 will also follow the
same mathematical operation and that too with same constants.
4. Let a group of two observations of size 𝑛1 and 𝑛2 having means 𝜇𝑥 1 and 𝜇𝑥 2 , then the
combined mean (𝜇𝑥 ) of the composite group of 𝑛1 + 𝑛2 (= 𝑁) observations is given by
𝑁𝜇𝑥 = 𝑛1 𝜇𝑥 1 + 𝑛2 𝜇𝑥 2 2.1.9
5. The sum of squares of deviations has the smallest value if deviations are taken from their
mean or AM.
2
∑ 𝑥𝑖 − 𝐴 is minimum, when 𝐴 = simple AM 2.1.11
Median – The middle most value when a set of observations are sorted in order of magnitude is
called median. It can be calculated from a grouped frequency distribution by using the formula :
𝑁/2 −𝐹
Median = 𝑙1 + ×𝑐 2.1.12
𝑓𝑚
where, 𝑙1 is lower bound of the median class, 𝑁 is total frequency, 𝐹 is cumulative frequency
corresponding to 𝑙1 , 𝑓𝑚 is frequency of the median class and 𝑐 is width of the median class.
Median is, in a certain sense, the real measure of central tendency because it gives the value of
the most central observation. Moreover, it is unaffected by higher or lower bound values, and can
be easily calculated from frequency distributions with open-end classes.
Mode – The value in a set of observations which occurs with the highest frequency is known as
mode. This actually, reflects the most often occurring value. It is generally calculated as
𝑑1
Mode = 𝑙1 + 𝑑 ×𝑐 2.1.13
1 + 𝑑2
where, 𝑙1 is lower bound of the highest frequency class, 𝑑1 is difference of frequencies in the
highest frequency class and the preceding class, 𝑑2 is difference of frequencies in the highest
frequency class and the following class, and 𝑐 is common width of classes. Eq. 2.1.3 is
applicable only when all classes have the same width. One can note that mode has a peculiarity,
i.e., in case of observations occurring with equal frequency, mode does not exist.
An interesting approximate empirical relationship between mean, mode and median exist and it
can be expressed as
Note: this expression only holds fairly for single mode with moderate asymmetry.
Variance is defined as arithmetic mean of squared deviation from mean, where the deviation
from mean, square deviation from mean and variance are shown below
Variance is generally denoted by 𝜎 2 , further, below expressions for simple series as well as
frequency distribution are given.
1 2
For simple series, 𝜎 2 = ∑ 𝑥𝑖 − 𝜇𝑥 𝑖 2.1.16
𝑛
1 2
For frequency distribution, 𝜎 2 = 𝑁 ∑ 𝑓𝑖 𝑥𝑖 − 𝜇𝑥 𝑖 2.1.17
1 2
Standard Deviation (𝜎) = ∑ 𝑥𝑖 − 𝜇𝑥 𝑖 2.1.18
𝑛
Both, variance and standard deviation are vital tools for representation of a statistical data as it
shows dispersion of the data from mean in its domain.
Covariance is defined for pair of random variables which is associated or related to each other. It
is the average of product of individual deviation from the corresponding means. Eq. 2.1.19
shows covariance Cov 𝑥, 𝑦 between two correlated random variables 𝑥 and 𝑦.
1
Cov 𝑥, 𝑦 = 𝑥𝑖 − 𝜇𝑥 𝑖 𝑦𝑖 − 𝜇𝑦 𝑖 2.1.19
𝑛
∑ 𝑥𝑦 ∑𝑥 ∑𝑦
Cov 𝑥, 𝑦 = − 2.1.20
𝑛 𝑛 𝑛
Cov 𝑥, 𝑦
𝜌= 2.1.21
𝜎𝑥 𝜎𝑦
Substituting the values of Cov 𝑥, 𝑦 , 𝜎𝑥 and 𝜎𝑦 from Eq. 2.1.19 and 2.1.18 in Eq. 2.1.21, one
gets
∑ 𝑥 − 𝜇𝑥 𝑦 − 𝜇𝑦
𝜌=
2 2.1.22
∑ 𝑥 − 𝜇𝑥 2. ∑ 𝑦 − 𝜇𝑦
∑𝑥𝑦 − 𝑛𝜇𝑥 𝜇𝑦
𝜌=
2.1.23
∑𝑥 2 − 𝑛𝜇𝑥2 ∑𝑦 2 − 𝑛𝜇𝑦2
As 𝑛𝜇𝑥 = ∑ 𝑥 and 𝑛𝜇𝑦 = ∑ 𝑦, one can substitute this to the above equation and on simplifying,
𝑛 ∑𝑥𝑦 − ∑𝑥 ∑𝑦
𝜌= 2.1.24
𝑛∑𝑥 2 − ∑𝑥 2 𝑛∑𝑦 2 − ∑𝑦 2
Percentile
Percentile is a value below which a given percentage of observations fall. For example, 99% of
the observations will fall under 99 percentile (𝑃99 ). As per rank the values of different
percentiles can be arranged as 𝑃1 < 𝑃2 < ⋯ < 𝑃99 .
Regression
Regression is an estimation process done for average value of one variable for a specified value
of other variable. It is conducted with respect to suitable equations (i.e., regression equations)
based on statistical data (combined as well as individual) of the random variables. For simple
regression, one can consider linear relationship between the variables. Hence, estimates of 𝑦
(denoted by 𝑦′) is given by regression equation of 𝑦 on 𝑥 as
𝑦 ′ − 𝜇𝑦 = 𝑏𝑦𝑥 𝑥 − 𝜇𝑥 2.1.25
where, regression coefficient 𝑏𝑦𝑥 = Cov 𝑥, 𝑦 𝜎𝑥2 and similarly, regression equation of 𝑥 on 𝑦 is
given as Eq. 2.1.26 for estimate of 𝑥 (denoted by 𝑥′)
𝑥 ′ − 𝜇𝑥 = 𝑏𝑥𝑦 𝑦 − 𝜇𝑦 2.1.26
where, regression coefficient 𝑏𝑥𝑦 = Cov 𝑥, 𝑦 𝜎𝑦2 . Now consider a straight line fit as shown
below for better understanding of formulation and calculations related to regression.
𝑦 = 𝑎 + 𝑏𝑥 2.1.27
where, random variable 𝑥 is independent whereas 𝑦 is dependent of 𝑥. Hence, in Eq. 2.1.27 one
gets coefficients 𝑎 and 𝑏 as unknown terms which are to be evaluated as per regression.
Multiplying Eq. 2.1.27 by 1 and 𝑥, moreover summing up the observations of the random
variables, one gets
∑𝑦 = 𝑎𝑛 + 𝑏∑𝑥 2.1.28
𝜇𝑦 = 𝑎 + 𝑏𝜇𝑦 2.1.30
𝑎 = 𝜇𝑦 − 𝑏𝜇𝑥 2.1.31
thus, unknown coefficient 𝑎 is evaluated in terms of individual mean of both the random
variables. Now, multiply Eq. 2.1.28 by ∑ 𝑥 and divide Eq. 2.1.29 by 𝑛
2
∑𝑥 ∑𝑦 = 𝑛𝑎 ∑𝑥 + 𝑏 ∑𝑥 2.1.32
2
𝑛 ∑𝑥𝑦 = 𝑛𝑎 ∑𝑥 + 𝑛𝑏 ∑𝑥 2.1.33
𝑛 ∑𝑥𝑦 − ∑𝑥 ∑𝑦 = 𝑏 𝑛 ∑𝑥 2 + ∑𝑥 2 2.1.34
𝑛 ∑𝑥𝑦 − ∑𝑥 ∑𝑦
∴𝑏= 2.1.35
𝑛 ∑𝑥 2 + ∑𝑥 2
∑𝑥𝑦 ∑𝑥 ∑𝑦
𝑛 − 𝑛 𝑛 cov 𝑥, 𝑦 𝜎𝑦
𝑏= 2 = 2
=𝜌 2.1.36
∑𝑥 2 ∑𝑥 𝜎𝑥 𝜎𝑥
+
𝑛 𝑛
𝑦 − 𝜇𝑦 = 𝑏𝑦𝑥 𝑥 − 𝜇𝑥 2.1.37