Professional Documents
Culture Documents
Multidimensional Scaling
El Escalamiento Multidimensional es una técnica de análisis
multivariante que, partiendo de una matriz de distancias (o bien
de similitudes) entre individuos, produce una representación de
los individuos en una escala euclidea ordinaria de modo que las
distancias en dicha escala se aproximen lo mejor posible a las
distancias de partida.
Example
> spot = dist(USArrests, method = "euclidean")
> ?dist()
This function computes and returns the distance matrix computed
by using the specified distance measure to compute the distances
between the rows of a data matrix.
dist(x, method = "meth.name", diag =FALSE, upper = FALSE, p = 2)
x a numeric matrix, data frame or "dist" object.
method the distance measure to be used. This must be one of
"euclidean", "maximum", "manhattan", "canberra", "binary" or
"minkowski". Any unambiguous substring can be given.
diag logical value indicating whether the diagonal of the
distance matrix should be printed by print.dist.
p The power of the Minkowski distance.
> x11()
> plot(what[,1], what[,2], xlab = "Axis 1", ylab = "Axis 2",
main = "US Arrests")
> head(what)
[,1] [,2]
Alabama -64.80216 11.448007
Alaska -92.82745 17.982943
Arizona -124.06822 -8.830403
Arkansas -18.34004 16.703911
California -107.42295 -22.520070
Colorado -34.97599 -13.719584
Classical MDS
> library(magrittr)
> library(dplyr)
> library(ggpubr)
# Cmpute MDS
> mds <- swiss %>%
+ dist() %>%
+ cmdscale() %>%
+ as_tibble()
> colnames(mds) <- c("Dim.1", "Dim.2")
# Plot MDS
> x11()
> ggscatter(mds, x = "Dim.1", y = "Dim.2",
+ label = rownames(swiss),
+ size = 1,
+ repel = TRUE)
Create 3 groups using k-means clustering. Color points by groups
# K-means clustering
> clust <- kmeans(mds, 3)$cluster %>%
+ as.factor()
> mds <- mds %>%
+ mutate(groups = clust)
# Plot and color by groups
> x11()
> ggscatter(mds, x = "Dim.1", y = "Dim.2",
+ label = rownames(swiss),
+ color = "groups",
+ palette = "jco",
+ size = 1,
+ ellipse = TRUE,
+ ellipse.type = "convex",
+ repel = TRUE)
Example 2 (MI FORMA)
> data("swiss")
> spot = dist(swiss, method = "euclidean")
> what = cmdscale(spot)
> x11()
> plot(what[,1], what[,2], xlab = "Dim.1", ylab = "Dim.2")
col, font the color and (if vfont = NULL) font to be used,
possibly vectors. These default to the values of the global
graphical parameters in par().
Example 3
> dist.au <-
read.csv("http://rosetta.reltech.org/TC/v15/Mapping/data/dist-
Aus.csv")
> dist.au
X A AS B D H M P S
1 A 0 1328 1600 2616 1161 653 2130 1161
2 AS 1328 0 1962 1289 2463 1889 1991 2026
3 B 1600 1962 0 2846 1788 1374 3604 732
4 D 2616 1289 2846 0 3734 3146 2652 3146
5 H 1161 2463 1788 3734 0 598 3008 1057
6 M 653 1889 1374 3146 598 0 2720 713
7 P 2130 1991 3604 2652 3008 2720 0 3288
8 S 1161 2026 732 3146 1057 713 3288 0
> library(anacor)
> ca=anacor(selected.cars)
> plot(ca, conf=NULL)
> x11()
> plot(ca, conf=NULL)
This ought to look familiar to anyone working in the automotive
industry. Let's work our way around the four quadrants: Quadrant
I Sporty, Quadrant II Economical, Quadrant III Family, and
Quadrant IV Luxury. Another perspective is to see an economy-
luxury dimension running from the upper left to the lower right
and a family-sporty dimension moving from the lower left to the
upper right (i.e., drawing a large X through the graph).