You are on page 1of 31

DATA EXPLORATION

STEM AND LEAF VIA MINITAB


Stem-and-Leaf Display: C1
Stem-and-leaf of C1 N = 80
Leaf Unit = 1.0

1 7 6
Cummulative 2 8 7
Frequency 3 9 7
5 10 15
8 11 058
11 12 013
17 13 133455
25 14 12356899
37 15 001344678888
(10) 16 0003357789
33 17 0112445668
23 18 0011346
16 19 034699
10 20 0178
6 21 8
5 22 189
2 23 7
1 24 5
Quartiles

When an ordered set of data is divided into four equal


parts, the division points are called quartiles. The first or
lower quartile, q , is a value that has approximately 25%
1

of the observations below it and approximately 75% of


the observations above. The second quartile, q , has
2

approximately 50% of the observations below its value.


The second quartile is exactly equal to the median. The
third or upper quartile, q , has approximately 75% of the
3

observations below its value. As in the case of the


median, the quartiles may not be unique. The
compressive strength data in Fig. 6-6 contains n 80
observations.
First Quartile is (n+1)/4 ordered observation
When n = 80, it is 20.25 observation. Hence, it is
between 20 and 21st observation
Third Quartile is 3(n+1)/4 ordered observation
When n = 80, it is 60.25 observation. Hence, it is
between 60 and 61st observation

InterQuartile Range (IQR) = Third Quartile - First Quartile

MINITAB OUTPUT
The box plot is a graphical display that simultaneously describes several important
features of a data set, such as center, spread, departure from symmetry, and
identification of unusual observations or outliers.
FREQUENCY DISTRIBUTION
AND HISTOGRAM
TIME SEQUENCE PLOTS

A time series or time sequence is a data set in which the observations are
recorded in the order in which they occur. A time series plot is a graph in
which the vertical axis denotes the observed value of the variable (say x) and
the horizontal axis denotes the time (which could be minutes, days, years,
etc.) When measurements are plotted as a time series, we often see trends,
cycles, or other broad features of the data that could not be seen otherwise.
Scatter Diagrams
Multivariate Data
In many
problems,
engineers and
scientists work
with data that is
multivariate in
nature; that is,
each observation
consists of
measurements of
several variables.

Quality Data of a
typical beverage
Scatter diagram is a
useful way to graphically
display the potential
relationship between
two or more variable
A scatter diagram is
constructed by plotting
each pair of
observations with one
measurement in the
pair on the vertical axis
of the graph and the
other measurement in
the pair on the
horizontal axis. Beverages of more intense color generally have a higher
quality rating.
Matrix of scatter diagrams is useful when two or more variables exist. It is helpful in
looking at all of the pairwise relationships between the variables in the sample.
CORRELATION COEFFICIENT

a quantitative measure of the strength of the linear relationship between two random
variables x and y

If the two variables are perfectly linearly related with a positive slope rxy = 1 and if they are
perfectly linearly related with a negative slope, then rxy = −1. If no linear relationship between
the two variables exists, then rxy = 0. The simple correlation coefficient is also sometimes
called the Pearson correlation coefficient after Karl Pearson,

Correlations below | 0.5 | are generally considered weak and correlations above | 0.8 | are generally
considered strong.
CORRELATION COEFFICIENT BETWEEN QUALITY AND pH Values
Correlation
Coefficients
PROBABILITY PLOTS
How to check whether data belongs to a particular probability
distribution
Leaf Unit = 0.1