You are on page 1of 4

Assignment 2 for STAT242

Sam van Betuw


29 July 2016

1. (a) The correlation matrix is used (instead of the covariance matrix), so that each
original variable is given equal influence on the results, regardless of units.

The high KMO value of 0.912 tells us that the data is suitable for principal
component analysis, based on correlation (as if the variables are not correlated,
the analysis will not be valid).
The diagonals on the Anti-Image Correlation matrix agree with this reading,
as these measures of sampling adequacy are quite high (5 of them > 0.9, 1 of
them > 0.8).

(b) Z1 = 0.958X1 + 0.924X2 + 0.942X3 + 0.952X4 + 0.965X5 + 0.902X6


Z2 = −0.207X1 + 0.226X2 − 0.265X3 + 0.134X4 − 0.172X5 + 0.307X6

We can see from the ”Total Variance Explained” Table that 93.6% of the vari-
ance is explained by the first two principle components.

(c) Z1 is associated with the overall size of the dolphin.


Z2 determines the shape of the dolphin. We can see that as X1 , X3 , and X5
increase together, X2 , X4 , and X6 decrease together.

The graph makes it clear that dolphins from the North Island have higher Z1
values overall than the South Island dolphins - that is, North Island dolphins
are larger than south island dolphins.

South Island dolphins have a much wider range of values for Z2 than North
Island dolphins - they have a more varied shape, with Z2 values both higher
and lower than Z2 values for North Island dolphins.

1
3.00000

2.00000

REGR factor score 2 for analysis 1


S S S

S N N
N
1.00000 S S S
S N
S S S N
N
S
S S S S
S N
S S N
S S S
S S S N
.00000
S SS S N
S SS
S S SS
S N N
S S
-1.00000
S N
S
S

-2.00000 S
S

-3.00000

-2.00000 -1.00000 .00000 1.00000 2.00000 3.00000

REGR factor score 1 for analysis 1

(d) It does not appear that the North Island and South Island dolphins belong
to the same species - North Island dolphins are larger than the South Island.
Also, while similar in shape, there is variability in the South Island dolphin
data for shape that does not appear in the North Island data.

2. (a) We have 3 principal components, and 85.8% of the variation is explained by


these components.

(b) Our principal component Z1 is:


Z1 = −0.296M − 0.923P + 0.575T + 0.553CP − 0.516CA − 0.341SW +
0.803M O + 0.940L
Page 1

This compares percentages based on type of work that is prevalent in an area.


In one group, we have M, P, CA, and SW , These are more office based jobs.
In the other group, we have T, CP, M O, and L. These are more labour-
intensive, or ”trades” based jobs.

So a higher Z1 value corresponds to a higher percentage of the electorate being


employed in trades based jobs rather than office based.

(c) There appear to be two small groupings - to the extreme of both Z1 and Z2 ,
we have Manurewa, Manukau East, and Mangere: these areas have both a
high percentage of trades work, and also, within this group a lower percentage
are in managerial positions, with more in sales work, machinery operation and
administrative work.

To the other extreme, we have Tamaki and Epsom: This tells us we have
a higher percentage in office work, lower in trades, and a higher percentage
2
within the office work group are in managerial positions and the ’professionals’
category.

3.00000

Manukau East
REGR factor score 2 for analysis 2

2.00000 Manurewa
Mangere
Rimutaka

Palmerston North Kelston Wigram


Upper Harbour Christchurch East
New Lynn Te Atatu Tamaki Makaurau
1.00000 Botany Hamilton West
PakurangaPapakura Christchurch Central
Mt Roskill Hamilton East Dunedin South
Maungakiekie Tauranga
Otaki Waimakariri Te Tai Tokerau
Hutt South Hauraki-Waikato
Ilam Rodney Whangarei
East Coast Bays NorthcoteMana Bay of Plenty Nelson
Te Tai Hauauru
.00000 Te Tai Tonga
Port Hills New Plymouth
Napier Waiariki
North Shore Rotorua Ikaroa-Rawhiti
Wellington Central Hunua Invercargill
Taupo Rangitata
Ohariu RongotaiAuckland Central Helensville
East Coast Whanganui
-1.00000 Mt Albert Selwyn
Tukituki Rangit?kei
Wairarapa
Tamaki Coromandel West Coast-Tasman
Waikato Kaikoura
Epsom
Northland Waitaki

-2.00000 Taranaki-King Country Clutha-Southland

-3.00000 -2.00000 -1.00000 .00000 1.00000 2.00000

REGR factor score 1 for analysis 2

(d) Top 7 Values:


1 - Ikaroa-Rawhiti (1.873)
2 - Tai Hauauru (1.690)
3 - Waiariki (1.542)
4 - Mangere (1.481)
5 - Manukau East (1.280)
6 - Hauraki-Waikato (1.143)
7 - Te Tai Tonga (1.109)

Of these 7 values, 5 are Maori electorates, suggesting that Maori electorates


are more likely to have higher percentages of people employed in trade-based
Page 1
jobs.

Bottom 7 Values:
1 - Epsom (-2.298)
2 - Tamaki (-2.120)
3 - Wellington Central (-2.016)
4 - Ohariu (-1.985)
5 - Mt Albert (-1.686)
6 - North Shore (-1.583)
7 - Rongotai (-1.460)

These 7 values are richer suburbs of Auckland and Wellington (with no Maori
electorates), suggesting that these areas have higher percentages of people
3
employed in office jobs.

3. (a) When we examine one principal component, we get:


Z1 = 0.909X1 + 0.904X2 + 0.872X3 + 0.925X4 + 0.925X5 + 0.915X6 +
0.827X7
This principal component explains 80.5% of the variation in record times. Over-
all, this component looks at time, with a higher Z1 value being a higher time
for each race, meaning the athlete is slower.

Overall, by ordering the data by Z1 values, we can see that in general, as Z1


increases, times for each race increase (worse times). We can also see that
lower Z1 are mostly European countries, and the higher Z1 values are mostly
Asian and South American countries.

(b) With two principal components, we have:


Z1 = 0.909X1 + 0.904X2 + 0.872X3 + 0.925X4 + +0.925X5 0.915X6 + 0.827X7
Z2 = 0.340X1 + 0.384X2 + 0.151X3 − 0.181X4 − 0.310X5 − 0.327X6 − 0.042X7
These two principal components explain 88.0% of the variation in the record
times.

The second component distinguishes between ability to run a short distance


and a long distance. As X1 , X2 , X3 increase together, X4 , X5 , X6 , X6 decrease
together. So, an athlete with a high Z2 value is going to be better at short dis-
tance races (100m, 200m, 400m), and an athlete with a low Z2 value is better
at long distance races (800m, 1500m, 3000m, and the marathon).

2.00000 Romania
Chile

Spain

China
REGR factor score 2 for analysis 1

Italy
Kenya
Denmark
1.00000
Singapore
New Zealand Japan
Belgium Burma
Mexico
Hungary
India
.00000 Russia Colombia
Great Britain Sweden

USA France
Poland
Australia
Canada Finland
Germany Czechslovak
-1.00000
Argentina
Indonesia Malasia
Thailand

Brazil
-2.00000

-2.00000 -1.00000 .00000 1.00000 2.00000

REGR factor score 1 for analysis 1

You might also like