You are on page 1of 45

ANALISIS PEUBAH GANDA

(MULTIVARIATE ANALYSIS)

Budi Yuniarto
Tak Kenal Maka Tak Sayang …
Ceritakan diri Anda

• Nama …?

• Asal … ?

• Curhat mungkin …?
My story
• My Name

• My Experience

• Learning Strategy

NB : Remind me if I talk too much


Pokok Bahasan Kita
• Aspek-Aspek dalam Analisis
Peubah Ganda
• Sampel Geometri dan Vektor
Peubah Acak
• Sebaran Normal Ganda
• Inferensia Vektor Rata-Rata
• Komponen Utama dan Model
Faktor
• Kluster dan Diskriminan
• Korelasi Kanonik
Kompetensi Umum
• Memahami metode
analisis data dengan
banyak peubah
secara simultan.
• Mampu menerapkan
berbagai
penggunaannya.
Referensi
• Johnson, Richard A and Dean W. Wichern.
Applied Multivariate Statistical Analysis,
fifth ed. Prentice-Hall, Inc. New Jersey. 2002.
• Rencher A.C. Methods of multivariate
analysis (2ed). Wiley, 2002.
• Hardle, W and Simar, L. Applied
Multivariate Statistical Analysis, second
ed. Springer-Verlag. 2007.

Pelengkap:
• Brian Everitt, Torsten Hothorn. An
Introduction to Applied Multivariate
Analysis with R. Springer-Verlag New York
(2011)
Google Classroom

Join the class:


vpdovu
Google Classroom

Join the class:


nkkbu
How to join classroom

Sign in to your e-mail


(@stis.ac.id) and click here to
view your all google apps

Then click this icon to open


Classroom app.
How to join classroom

Click (+) button and select Join


class. Input your class code.
Software
PERTEMUAN I - PENDAHULUAN

Let’s begin the journey …


“We are drowning in
information and starved for
knowledge”

(Tom Peters, Thriving on Chaos)


MULTIVARIATE ANALYSIS IN
STATISTICAL TERMS
• Generally, Multivariate analysis refers to all statistical
techniques that simultaneously analyze multiple
measurements on individuals or objects under
investigation.

• Thus, any simultaneous analysis of more than two


variables can be loosely considered multivariate analysis.
But …
Confusion sometimes arises about what multivariate analysis
is because the term is not used consistently in the literature.
examining relationships the multivariate character lies
between or among more in the multiple variates
than two variables. (multiple combinations of
variables), and not only in the
only for problems in which all number of variables or
the multiple variables are observations.
assumed to have a multi-
variate normal distribution
Why multivariate?
The objectives of scientific investigation to which multivariate
methods lend themselves include the following:

Data reduction or structural simplification

Sorting and grouping

Investigation of the dependence among


variables

Prediction

Hypothesis construction and testing


A CLASSIFICATION OF MULTIVARIATE
TECHNIQUES

Can the variables be divided into independent and


dependent classifications based on some theory?

3 If they can, how many variables are treated as


dependent in a single analysis?
Questions

How are the variables, both dependent and


independent, measured?
Dependence Interdependence

A dependence technique may be defined as


one in which a variable or set of variables
is identified as the dependent variable to be
predicted or explained by other variables

An interdependence technique is one in which


no single variable or group of variables is defined
as being independent or dependen
Dependence:
Interdependence:
Variate versus Variable

Linear combination of variables formed in


the multivariate technique by deriving
empirical weights applied to a set of
variables specified by the researcher.
Data Organization
• Multivariate data are a collection of observations (or
• measurements) of:
◦ p variables (k = 1, . . . , p).
◦ n “items” (j = 1, . . . , n).

• “items” can also be though of as


subjects/examinees/individuals or entities (when people
are not under study) .
• In some disciplines (such as educational
measurement), “items” are considered the variables
collected per individual.
Descriptive Statistics of Multivariate Data
• When we have a large amount of data, it is often hard to
get a manageable description of the nature of the
variables under study.
• For this reason, descriptive statistics are used.
• Such descriptive statistics include:
• Means.
• Variances.
• Covariances.
• Correlations.
Sample Mean

Sample
Variance

Sample
Covariance

Sample
Covariance
Matrix
Sample
Correlation

Sample
corelation
Matrix
Graphical Technique
• Displaying multivariate data can be difficult due to our
natural limitations of 3-dimensions.
• Several simple ways of displaying data include:
◦ Bivariate scatterplots.
◦ Three-dimensional scatterplots.
• Some plots that can be achieved by multivariate methods
include:
◦ “Stars.”
◦ Chernoff faces.
Scatterplot
Trivariate Scatterplot
Using R
> #trivariate scatterplot
> install.packages("scatterplot3d")
> library(scatterplot3d)
> attach(mtcars)
> plot(wt,mpg)
> scatterplot3d(wt,disp,mpg, main="3D Scatterplot")
Stars
Using R
> stars(mtcars[, 1:7], locations = c(0, 0), radius = FALSE,
+ key.loc = c(0, 0), main = "Motor Trend Cars", lty = 2)
Chernoff faces

Chart showing Chernoff faces for data selected from the


"USJudgeRatings" dataset in R, which contains ratings of state judges
in the US Superior Court by lawyers who have had contact with them.
Using R
> #chernoff faces
> install.packages("aplpack")
> library(aplpack)
> faces()
> faces(face.type=1)

> faces(rbind(1:3,5:3,3:5,5:7))

> data(longley)
> faces(longley[1:9,],face.type=0)
> faces(longley[1:9,],face.type=1)
Dendograms
Distance Measure
• A great number of multivariate techniques revolve around
the computation of distances:
◦ Distances between variables.
◦ Distances between entities.
• The formula for the Euclidean distance formula between
the coordinate pair P = (x1, x2) and the origin P = (0, 0):
• Just keep in mind that there are statistical
analogs to distance measures, taking the
variability of variables into account.
• Also be aware that there are literally an
infinite number of distance measures!
• Distance measure must satisfy the
following:
◦ d(P,Q) = d(Q, P)
◦ d(P,Q) > 0 if P ≠ Q
◦ d(P,Q) = 0 if P = Q
◦ d(P,Q) ≤ d(P,R) + d(R,Q) (known as the
triangle inequality)
Statistical distance
• Straight-line, or Euclidean, distance is unsatisfactory for
most statistical purposes.
• This is because each coordinate contributes equally to the
calculation of Euclidean distance.
• When the coordinates represent measurements that are
subject to random fluctuations of differing magnitudes, it is
often desirable to weight coordinates subject to a great
deal of variability less heavily than those that are not
highly variable.
Are these points have
same distance from the
center?

One way to proceed is to divide each coordinate by the sample standard


deviation. Therefore, upon division by the standard deviations, we have the
"standardized" coordinates 𝑥1∗ = 𝑥1 𝑠11 and 𝑥2∗ = 𝑥2 𝑠22
• Thus, a statistical distance of the point P = (x1, x2) from the
origin O = (0, 0) can be computed from its standardized
coordinates 𝑥1∗ = 𝑥1 𝑠11 and 𝑥2∗ = 𝑥2 𝑠22
Next Session
• Aljabar matriks dan random vektor:
• Dasar-dasar vektor dan matriks
• Vektor orthogonal dan ortonormal
• Matriks orthogonal, matriks definit positif, penguraian spectral,
matriks akar kuadrat
• Vektor peubah acak
• Vektor rata-rata, matriks ragam peragam dan matriks korelasi

• Johnson et al, Bab 2

You might also like