You are on page 1of 12

MULTIVARIATE METHODS I

PRINCIPAL COMPONENT ANALYSIS

Introduction
DATA DESCRIPTION
Data set :
Microsoft Office
Excel 97-2003 Worksheet

Datafile Name : Companies


Datafile Subjects : Economics
Story Names : Forbes 500 Companies Sales
Reference : Forbes, 1986
Authorization : free use

Description: Facts about companies selected from the


Forbes 500 list for 1986. This is a 1/10 systematic sample
from the alphabetical list of companies. The Forbes 500
includes all companies in the top 500 on any of the
criteria, and thus has almost 800 companies in the list.

Number of cases: 77
Variable Names:
Assets : Amount of assets (in millions)
Sales : Amount of sales (in millions)
Market_Value : Market Value of the company (in millions)
Profits : Profits (in millions)
Cash_Flow : Cash Flow (in millions)
Employees : Number of employees (in thousands)
Sector : Type of market the company is
associated
with.
Use 6 variables except sector varible since it is

character vaeible.
DATA ANALYSIS USING R
Correlaton matrix

Assets Sales Market_Value Profits Cash_Flow Employees


Assets 1.0000000 0.7464649 0.6822122 0.6016986 0.6409018 0.5943581
Sales 0.7464649 1.0000000 0.8788920 0.8137758 0.8549172 0.9240429
Market_Value 0.6822122 0.8788920 1.0000000 0.9681987 0.9702851 0.8182161
Profits 0.6016986 0.8137758 0.9681987 1.0000000 0.9887795 0.7621057
Cash_Flow 0.6409018 0.8549172 0.9702851 0.9887795 1.0000000 0.7866148
Employees 0.5943581 0.9240429 0.8182161 0.7621057 0.7866148 1.0000000

Give interpretations
The correlation matrix displayed different levels of correlation between
the variables.
The correlation is a number between -1 and +1 that measures how close
the relationship between two variables. Correlation = +1 means
variables are perfectly positively correlated (they go up and down in
perfect synchronization);
-1 means perfect negative correlation (one goes up and the other goes
down);
values close to 0 mean either no relation or the relation isn't linear.
EIGENVALUES AND EIGEN VECTORS
Introduce them
Values are in the r codes file
IMPORTANCE OF PRINCIPAL
COMPONENT
PC1 PC2 PC3 PC4 PC5 PC6

Standard 2.2447 0.71895 0.59915 0.22250 0.17062 0.08289


deviation

Standard 0.8398 0.08615 0.05983 0.00825 0.00485 0.00115


deviation

Cumulati 0.8398 0.92592 0.98575 0.99400 0.99885 1.00000


ve
Proportio
n
Here these principal components accounting for 92.592% of the
total variance provide a useful subset for explaining the data. In
this case, the first two principal components account for more
than92.592 % of the total variance, suggesting that two
components are all that are need to explain the data.
Give more interpretaitons


SCREE PLOT
Give introduction and interprete plot
SCORE PLOT
Identify scores and score plot and
interprete
LOADING PLOT
Identify loadings and loading plot and
interprete
Loading plots for 1st two PCs
BIPLOT

Identify biplot plot and interprete

You might also like