Professional Documents
Culture Documents
UNIVERSITY
Before starting this module, the student must have first completed
Module 8, Introduction to Process Integration. This module
includes basic concepts not repeated here, notably those related to
data quality.
Raw Numbers
DATA
Understanding
MVA
NAMP Module 17: Introduction to Multivariate Analysis Tier 1, Part 1, Rev.: 0
Multivariate Analysis is Based
on Ockhams Razor
+1 -1
However, there will never be more than one difference: is it an
apple or an orange? In MVA parlance, we would say that there is
only one latent attribute.
Using these graphs, which our eyes and brains can easily handle,
we are able to peer into the database and identify trends and
correlations.
This is illustrated on
the next page
3
-1
-1
Raw Data:
1
1
0
0
1
2
2.45
2.6
2.67
2.45
4
-1
0
impossible to
1
-1
0
1
3
1
2.53
3.02
2.98
3.22
4 0 interpret
-1 1 2 2.7 2.57
4 0 -1 1 3 2.97 2.63
5 0 0 0 1 2.89 3.16 Y
5 0 0 0 2 2.56 3.32 trends
5 0 0 0 3 2.52 3.26
6 0 1 -1 1 2.44 3.1
trends X
6 0 1 -1 2 2.22 2.97 X trends
6 0 1 -1 3 2.27 2.92
X
hundreds of columns X
thousands of rows
2-D Visual Outputs
NAMP Module 17: Introduction to Multivariate Analysis Tier 1, Part 1, Rev.: 0
Illustrative Data Set: Food
Consumption in European Countries
To illustrate these concepts, we take an easy-to-understand
example involving food.
Look at the table on the following page. Can you tell anything
from the raw numbers? Of course not. No one could.
Notethat
Note thatMVA
MVAcan
canhandle
handle
Courtesy of Umetrics corp.
upto
up to10-20%
10-20%missing
missingdata
data
The first of these, the Score plot, shows all the original data points
(observations) in a new set of coordinates or components. Each
score is the value of that data point on one of the new component
dimensions:
The Score Plot is the
. .
.
..
projection of the original
. data points onto a plane
.. defined by two new
.. components.
A score plot shows how the observations are arranged in the new
component space. The score plot for the food data is shown on
the next page. Note how similar countries cluster together
Score Plot =
observations
Note that the quadrants are the same on each type of plot. Sweden
and Denmark are in the top-right corner; so are frozen fish and
vegetables. Using both plots, variables and observations can be
correlated with one another.
Projection
of old
variabiles
onto new
Loadings Plot =
variables
After
1. 1500
trials
2.
3. Not random at all
(+ve and ve noise
Looks random cancels out)
After doing this on-line course, reading the references and playing
around with real data, the student should at some point experience a
Eureka! moment when suddenly MVA makes sense. Unfortunately,
there is no shortcut to achieving this insight:
Broderick, G., J. Paris, J.L. Valade and J. Wood. Applying Latent Vector
Analysis to Pulp Characterization, Paperi ja Puu, 77 (6-7): 410-419.
For real
For real process
process data,
data, such
such
assumptions are
assumptions are totally
totally
unrealistic.
unrealistic.
Statistical tests help characterise an
existing dataset. They do NOT enable
you to make predictions about future
data. For this we must turn to
regression techniques
This equation can be used to predict a new y-value given new xis.
X X X Y
Y Y
X XX Y
X
X X
X
X XX
X
X
NAMP Module 17: Introduction to Multivariate Analysis Tier 1, Part 1, Rev.: 0
Linear vs. Nonlinear Regression
XiXj X 3
X 2