You are on page 1of 33

Introduction

Week – 01
MKT3802 Statistical and
Experimental Methods for Engineers
Text Books
Probability and Statistics for Engineers and Scientist, Ronald E. Walpole, Raymond H.
Meyers, Sharon L. Myers, Keying Ye, Pearson Prentice Hall, 8th Edition, 2007 or 9th
Edition, 2012

Probability and Statistics in Engineering, William W. Hines, Douglas C. Montgomery,


D. M. Goldman, Connie M. Borror, 4th Edition, Wiley

2
Outline
• Introduction
• Role of probability
• Measure of location (central tendency)
• Measures of variability (dispersion)
• Discrete and continuous data

3
Definitions
• Statistics is
– a science which helps us to collect, analyze and
present data systematically.
– the art of learning from data.

Organization of Presentation of Interpretation of


Collecting data Analyzing data
data data data

4
Importance
• Simplifies mass of data
• Helps to get concrete info from data
• Helps decision making
• Presents facts in a precise & definite form
• Facilitates comparison
• Facilitates predictions

5
Limitations
• Does not deal with individual items
• Deal with quantitatively expressed items
• Results from interpretation are not universally
true

6
Application areas
Improving
product design

Economics
Testing product
performance
Business
Quality Control

Determining
Biology reliability

Engineering

Health and Make a system


medicine safer

7
Branches of Statistics
• Descriptive Statistics
– 1st phase
– methods for collection, organization, presentation and
analysis of the data
– without any attempt to infer anything from the known
data
• Inferential (Inductive) Statistics
– 2nd phase
– Process of
• drawing conclusions (inferences)
• performing hypothesis testing
• determining relationship between variables
• making predictions

8
Examples
• Descriptive Statistics
– The daily average temperature of Istanbul was 8°C last
week
– The scores of 50 students in a Process Control exam are
found to range from 30 to 80.

• Inferential (Inductive) Statistics


– From the analysis of the income of 1000 randomly selected
citizens living in Istanbul suggest that the monthly average
income is estimated to be 5000 TL
– Turkish Statistical Institute declares the outcome of its
survey as “The population of Turkey in the year 2025 will
likely to be 88,844,934”

9
Probability
• a measure of the likelihood that a particular event
will occur.
– If we are certain that an event will occur, its probability is
1 or 100%.
– If it certainly will not occur, its probability is zero.

• Example: 260 bolts are examined as they are


produced. Five of them are found to be defective.
– Answer: probability of a defective bolt = 5 / 260 = 0.019.

10
Statistics vs. Probability
• There is a clear distinction
between the probability
and the inferential
statistics

• Statistical inference makes


use of the concepts in
probability.

• With the aid of statistical


methods and elements of
probability, conclusions
are drawn about some
feature of the population
11
Main Terms in Statistics
• Data: The measurements obtained in a research study
are called the data
– Quantitative: data which is expressed numerically, i.e.
heights, ages, weights, etc.
– Qualitative: data which is NOT expressed numerically, i.e.
colors, health, languages etc.
– Primary (direct observation) or secondary (external data
sources)

• Variable: a characteristic or condition that can change


or take on different values.
– Quantitative or qualitative, i.e. height, age, gender etc.
12
Main Terms in Statistics
• Population: Complete set of data of the entire group of
individuals. Can be finite or infinite. i.e. the number
students in this class

• Sample: A set of data from population that serve a basis


for valid generalization about the population.

• Sample size: The number of item in the Sample.

• Experiment: Process of obtaining the desired data.


– goal of an experiment is to demonstrate a cause-and-effect
relationship between two variables
– the manipulated variable is called the independent variable and
the observed variable is the dependent variable
13
Steps in Statistical
Investigation
Step 1: Sampling Procedures; Collection of Data
• Simple random sampling
a) any particular sample of a specified sample size
has the same chance of being selected as any
other sample of the same size
b) simple random sampling is not always
appropriate
• Design of experiment
a) Systematic selection
14
DoE Example
• A corrosion study was made in order to determine
whether coating an aluminum metal with a corrosion
retardation substance reduced the amount of corrosion
for different humidity levels.
– A corrosion measurement can be expressed in thousands of
cycles to failure.
– Two levels of coating: no coating and chemical corrosion
coating
– Two relative humidity levels :20% and 80% relative
humidity.
DoE Data

15
DoE Example

While we might draw conclusions about the role of humidity and the impact of
coating the specimens from the figure,

we cannot truly evaluate the results from an analytical point of view without taking
into account the variability around the average.
16
Measures of Location
(Central Tendency)

REMARK: Depending on the data, the median and mean can be quite different from
each other. 17
Measures of Location
Example: Suppose the data set is the following:
1.7, 2.2, 3.9, 3.11, and 11.5.

The sample mean and median are, respectively,


In MATLAB

1.7 + 2.2 + 3.9 + 3.11 + 14.7


𝑥= = 5.122
5

1.7 < 2.2 < 3.11 < 3.9 < 14.7

𝑥 = 𝑥(𝑛+1)/2 = 3.11

18
The mean
• Pros
– Most commonly used measure of location
– Uses all the observations in the data set
– All observations have equal weight

• Cons
– Affected by extreme values that may not be
representative of the sample

19
The median
• Pros
– Always exists and unique
– Not effected by extremes

• Cons
– Sorting is required
– Uses only one or two observation

20
Other measures of location
• Other measures of • Other types of mean:
location: – Geometric
– The mode – Harmonic
– Percentile/Quantiles – Quadratic
– Midrange – Trimmed
– Weighted
– Combination

21
Measures of Variability
(Dispersion)
• A measure of variability indicates how
observations are spread about the mean value
– Range
– Variance
– Standard deviation
– Coefficient of variation

22
Range
• The simplest one: 𝑅 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
• Pros
– Quick estimate of variance
– Easy calculation
• Cons
– Only uses extreme values
– The larger the data size, the more inefficient the
range becomes

23
Variance and Std. Deviation

24
Variance, s2 or σ2
• Pros
– An efficient estimator
– Can be added and averaged
• Cons
– Calculation can be tedious without the aid of
calculator or computer.

25
Standard Deviation, s
• Pros
– It is in the same dimensional unit as the observed
values
– An efficient estimator
• Cons
– Calculation can be tedious without the aid of
calculator or computer.

26
Coefficient of Variation, CV
• Measure of relative • Pros
dispersion – Can be used to compare
• Magnitude of variation variation between two
data set with different
to the size of the engineering units
quantity
𝑠
• 𝐶𝑉 = × 100 • Cons
𝑥
– Fail if mean is close to
zero
– Often misunderstood
and misused

27
MATLAB example for
variability

28
Discrete and Continuous
variables
Data

Qualitative
Quantitative
(Categorical)

Non-
Numeric Numeric
numeric

Discrete Continuous

If measuring how many If measuring how much

29
Graphical Diagnostics
• Scatter Plot
• Stem-and-Leaf Plot
• Histogram Plot
• Box-and-Whisker Plot or Box Plot

30
Scatter Plot

31
Histogram Plot

Negative Positive
skewness skewness32
Self Study
• Read and try to understand
– Example 1.1
– Example 1.2
– Example 1.3
– Example 1.4
in Walpole.

33

You might also like