Pertemuan 1

ANALISIS PEUBAH GANDA
(MULTIVARIATE ANALYSIS)
Budi Yuniarto
Tak Kenal Maka Tak Sayang …
Ceritakan diri Anda
• Nama …?
• Asal … ?
• Curhat mungkin …?
My story
• My Name
• My Experience
• Learning Strategy
NB : Remind me if I talk too much

Pokok Bahasan Kita
• Aspek-Aspek dalam Analisis
Peubah Ganda
• Sampel Geometri dan Vektor
Peubah Acak
• Sebaran Normal Ganda
• Inferensia Vektor Rata-Rata
• Komponen Utama dan Model
Faktor
• Kluster dan Diskriminan
• Korelasi Kanonik
Kompetensi Umum
• Memahami metode
analisis data dengan
banyak peubah
secara simultan.
• Mampu menerapkan
berbagai
penggunaannya.
Referensi
• Johnson, Richard A and Dean W. Wichern.
Applied Multivariate Statistical Analysis,
fifth ed. Prentice-Hall, Inc. New Jersey. 2002.
• Rencher A.C. Methods of multivariate
analysis (2ed). Wiley, 2002.
• Hardle, W and Simar, L. Applied
Multivariate Statistical Analysis, second
ed. Springer-Verlag. 2007.
Pelengkap:
• Brian Everitt, Torsten Hothorn. An
Introduction to Applied Multivariate
Analysis with R. Springer-Verlag New York
(2011)
Google Classroom
Join the class:

vpdovu
Google Classroom
Join the class:

nkkbu
How to join classroom
Sign in to your e-mail

(@stis.ac.id) and click here to
view your all google apps
Then click this icon to open

Classroom app.
How to join classroom
Click (+) button and select Join

class. Input your class code.
Software
PERTEMUAN I - PENDAHULUAN
Let’s begin the journey …

“We are drowning in
information and starved for
knowledge”
(Tom Peters, Thriving on Chaos)

MULTIVARIATE ANALYSIS IN
STATISTICAL TERMS
• Generally, Multivariate analysis refers to all statistical
techniques that simultaneously analyze multiple
measurements on individuals or objects under
investigation.
• Thus, any simultaneous analysis of more than two

variables can be loosely considered multivariate analysis.
But …
Confusion sometimes arises about what multivariate analysis
is because the term is not used consistently in the literature.
examining relationships the multivariate character lies
between or among more in the multiple variates
than two variables. (multiple combinations of
variables), and not only in the
only for problems in which all number of variables or
the multiple variables are observations.
assumed to have a multi-
variate normal distribution
Why multivariate?
The objectives of scientific investigation to which multivariate
methods lend themselves include the following:
Data reduction or structural simplification
Sorting and grouping
Investigation of the dependence among

variables
Prediction
Hypothesis construction and testing

A CLASSIFICATION OF MULTIVARIATE
TECHNIQUES
Can the variables be divided into independent and

dependent classifications based on some theory?
3 If they can, how many variables are treated as

dependent in a single analysis?
Questions
How are the variables, both dependent and

independent, measured?
Dependence Interdependence
A dependence technique may be defined as

one in which a variable or set of variables
is identified as the dependent variable to be
predicted or explained by other variables
An interdependence technique is one in which

no single variable or group of variables is defined
as being independent or dependen
Dependence:
Interdependence:
Variate versus Variable
Linear combination of variables formed in

the multivariate technique by deriving
empirical weights applied to a set of
variables specified by the researcher.
Data Organization
• Multivariate data are a collection of observations (or
• measurements) of:
◦ p variables (k = 1, . . . , p).
◦ n “items” (j = 1, . . . , n).
• “items” can also be though of as

subjects/examinees/individuals or entities (when people
are not under study) .
• In some disciplines (such as educational
measurement), “items” are considered the variables
collected per individual.
Descriptive Statistics of Multivariate Data
• When we have a large amount of data, it is often hard to
get a manageable description of the nature of the
variables under study.
• For this reason, descriptive statistics are used.
• Such descriptive statistics include:
• Means.
• Variances.
• Covariances.
• Correlations.
Sample Mean
Sample
Variance
Sample
Covariance
Sample
Covariance
Matrix
Sample
Correlation
Sample
corelation
Matrix
Graphical Technique
• Displaying multivariate data can be difficult due to our
natural limitations of 3-dimensions.
• Several simple ways of displaying data include:
◦ Bivariate scatterplots.
◦ Three-dimensional scatterplots.
• Some plots that can be achieved by multivariate methods
include:
◦ “Stars.”
◦ Chernoff faces.
Scatterplot
Trivariate Scatterplot
Using R
> #trivariate scatterplot
> install.packages("scatterplot3d")
> library(scatterplot3d)
> attach(mtcars)
> plot(wt,mpg)
> scatterplot3d(wt,disp,mpg, main="3D Scatterplot")
Stars
Using R
> stars(mtcars[, 1:7], locations = c(0, 0), radius = FALSE,
+ key.loc = c(0, 0), main = "Motor Trend Cars", lty = 2)
Chernoff faces
Chart showing Chernoff faces for data selected from the

"USJudgeRatings" dataset in R, which contains ratings of state judges
in the US Superior Court by lawyers who have had contact with them.
Using R
> #chernoff faces
> install.packages("aplpack")
> library(aplpack)
> faces()
> faces(face.type=1)
> faces(rbind(1:3,5:3,3:5,5:7))
> data(longley)
> faces(longley[1:9,],face.type=0)
> faces(longley[1:9,],face.type=1)
Dendograms
Distance Measure
• A great number of multivariate techniques revolve around
the computation of distances:
◦ Distances between variables.
◦ Distances between entities.
• The formula for the Euclidean distance formula between
the coordinate pair P = (x1, x2) and the origin P = (0, 0):
• Just keep in mind that there are statistical
analogs to distance measures, taking the
variability of variables into account.
• Also be aware that there are literally an
infinite number of distance measures!
• Distance measure must satisfy the
following:
◦ d(P,Q) = d(Q, P)
◦ d(P,Q) > 0 if P ≠ Q
◦ d(P,Q) = 0 if P = Q
◦ d(P,Q) ≤ d(P,R) + d(R,Q) (known as the
triangle inequality)
Statistical distance
• Straight-line, or Euclidean, distance is unsatisfactory for
most statistical purposes.
• This is because each coordinate contributes equally to the
calculation of Euclidean distance.
• When the coordinates represent measurements that are
subject to random fluctuations of differing magnitudes, it is
often desirable to weight coordinates subject to a great
deal of variability less heavily than those that are not
highly variable.
Are these points have
same distance from the
center?
One way to proceed is to divide each coordinate by the sample standard

deviation. Therefore, upon division by the standard deviations, we have the
"standardized" coordinates 𝑥1∗ = 𝑥1 𝑠11 and 𝑥2∗ = 𝑥2 𝑠22
• Thus, a statistical distance of the point P = (x1, x2) from the
origin O = (0, 0) can be computed from its standardized
coordinates 𝑥1∗ = 𝑥1 𝑠11 and 𝑥2∗ = 𝑥2 𝑠22
Next Session
• Aljabar matriks dan random vektor:
• Dasar-dasar vektor dan matriks
• Vektor orthogonal dan ortonormal
• Matriks orthogonal, matriks definit positif, penguraian spectral,
matriks akar kuadrat
• Vektor peubah acak
• Vektor rata-rata, matriks ragam peragam dan matriks korelasi
• Johnson et al, Bab 2

Pertemuan 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pertemuan 1

Uploaded by

Copyright:

Available Formats

ANALISIS PEUBAH GANDA

NB : Remind me if I talk too much

Join the class:

Join the class:

Sign in to your e-mail

Then click this icon to open

Click (+) button and select Join

Let’s begin the journey …

(Tom Peters, Thriving on Chaos)

• Thus, any simultaneous analysis of more than two

Data reduction or structural simplification

Sorting and grouping

Investigation of the dependence among

Hypothesis construction and testing

Can the variables be divided into independent and

3 If they can, how many variables are treated as

How are the variables, both dependent and

A dependence technique may be defined as

An interdependence technique is one in which

Linear combination of variables formed in

• “items” can also be though of as

Chart showing Chernoff faces for data selected from the

One way to proceed is to divide each coordinate by the sample standard

• Johnson et al, Bab 2

You might also like