Professional Documents
Culture Documents
(MULTIVARIATE ANALYSIS)
Budi Yuniarto
Tak Kenal Maka Tak Sayang …
Ceritakan diri Anda
• Nama …?
• Asal … ?
• Curhat mungkin …?
My story
• My Name
• My Experience
• Learning Strategy
Pelengkap:
• Brian Everitt, Torsten Hothorn. An
Introduction to Applied Multivariate
Analysis with R. Springer-Verlag New York
(2011)
Google Classroom
Prediction
Sample
Variance
Sample
Covariance
Sample
Covariance
Matrix
Sample
Correlation
Sample
corelation
Matrix
Graphical Technique
• Displaying multivariate data can be difficult due to our
natural limitations of 3-dimensions.
• Several simple ways of displaying data include:
◦ Bivariate scatterplots.
◦ Three-dimensional scatterplots.
• Some plots that can be achieved by multivariate methods
include:
◦ “Stars.”
◦ Chernoff faces.
Scatterplot
Trivariate Scatterplot
Using R
> #trivariate scatterplot
> install.packages("scatterplot3d")
> library(scatterplot3d)
> attach(mtcars)
> plot(wt,mpg)
> scatterplot3d(wt,disp,mpg, main="3D Scatterplot")
Stars
Using R
> stars(mtcars[, 1:7], locations = c(0, 0), radius = FALSE,
+ key.loc = c(0, 0), main = "Motor Trend Cars", lty = 2)
Chernoff faces
> faces(rbind(1:3,5:3,3:5,5:7))
> data(longley)
> faces(longley[1:9,],face.type=0)
> faces(longley[1:9,],face.type=1)
Dendograms
Distance Measure
• A great number of multivariate techniques revolve around
the computation of distances:
◦ Distances between variables.
◦ Distances between entities.
• The formula for the Euclidean distance formula between
the coordinate pair P = (x1, x2) and the origin P = (0, 0):
• Just keep in mind that there are statistical
analogs to distance measures, taking the
variability of variables into account.
• Also be aware that there are literally an
infinite number of distance measures!
• Distance measure must satisfy the
following:
◦ d(P,Q) = d(Q, P)
◦ d(P,Q) > 0 if P ≠ Q
◦ d(P,Q) = 0 if P = Q
◦ d(P,Q) ≤ d(P,R) + d(R,Q) (known as the
triangle inequality)
Statistical distance
• Straight-line, or Euclidean, distance is unsatisfactory for
most statistical purposes.
• This is because each coordinate contributes equally to the
calculation of Euclidean distance.
• When the coordinates represent measurements that are
subject to random fluctuations of differing magnitudes, it is
often desirable to weight coordinates subject to a great
deal of variability less heavily than those that are not
highly variable.
Are these points have
same distance from the
center?