You are on page 1of 9

---

title: "Analyzing the World's Top Company Market Values""


author: "Reagan Brell"
date: "December 4, 2016"
output: html_document
---

```{r setup, include=FALSE}


knitr::opts_chunk$set(echo = TRUE)
```

Don't change anything on the setup before this.

## Pledge

On my honor, I have neither given nor received any unauthorized assistance on


this work.

## Section I: Introduction

In this project, I will analyze the relationship between market value and the
other variables used to rank the top companies in 2004, according to forbes.com.
I am most interested in the market value of companies within the dataset. I
believe market value can be a clear reflection of what customers want. I assume
many factors affect company market value, but what variables correlate most
strongly with company market values?

I expect a direct relationship between market value and profits because


customers will pay more for products that are high in demand, increasing the
value of the company. For this project, my response variable will be market
value and my main explanatory variable will be profits.

## Section II: Exploratory Data Analysis

I am using the dataset "The Forbes 2000 Ranking of the World's Biggest
Companies" from 2004. The dataset includes a list of 2000 observations on 8
variables, which were used to measure the success of companies around the world.
The variables that I will focus on (in addition to market value) are profits,
sales, assets, and company category.

In order to use the data on R studio, there are a series of steps I need to
follow. I will first install the "HSAUR" package on R studio. Then I will use
the library function for "HSAUR" to load the package. And finally, I will attach
the dataset called "Forbes2000" in R studio. An example of the package
installment steps is listed below.

```{r}
install.packages("HSAUR")
library(HSAUR)
attach(Forbes2000)
```

## Variable Descriptions
In the dataset, each variable is important because it has a certain relationship
with market value. I will define the variables (that I want to compare) to clear
up any potential confusion about what the variable really describes.

The response variable in this project is market value, which can be defined as
the value or price given to the company, according to the market. Profits are
the net gains that companies make after paying for production cost and making
revenue. Sales are the exhanges companies make for money. Assets are the money
values given to company property. These four variables, market value, profits,
sales and assets, are numerical continuous variables and displayed in billions
of dollars.

If the explanatory variables (profits, sales and assets) have strong, positive
correlations with market value, I assume greater profits, sales and assets
remain important for a high company market value.

The only categorical variable that I will focus on is company category, defined
by any factor describing the products that a company makes. I want to
specifically focus on this categorical variable because I believe certain
categories will have higher market values.

## Study Design

I am specifivcally focused on market value for this project. I will look at


relationship between market value and the following three variables: profits,
sales, and assets. Because these variables are all numeric, I will be able to
look for the correlation between market value and the explanatory variables.
For the company category, I want to determine which field contains the highest
market values.

In this paragraph, I will explain how the explanatory variables relate to market
value. The first variable, profits, is related to market value because companies
making large profits are very desirable in the market and investment world. The
second variable, sales, is related to market value because companies with more
sales generate more revenue. Good sales encourage people to invest in the
company, thereby increasing company value. The third variable, assets, relates
to market value because companies putting more money into their business can
expect a larger return in profits and sales. If am correct in my prediction,
market value is high when company profits or sales are high.

These may just be theories, but if there is a direct relationship between the
variables, the theories could help explain why the variable relationships exist.

Here are some imporatant plots for my data:


```{r}
#Here are some plots for my Y, market value
hist(marketvalue)
boxplot(marketvalue)

#Here are some plots for my x1, profits


hist(profits)
boxplot(profits)

##Here are some plots for my x2, sales


hist(sales)
boxplot(sales)

#Here are some plots for my x3, sales


hist(assets)
boxplot(assets)

#Here are some plots for my x4, categories


table1 <- table(category)
barplot(table1)
```

Here are some descriptive statistics to complement the plots above:

```{r}
#Here are some descriptive statistics for my Y, market value
summary(marketvalue)
mean(marketvalue)
sd(marketvalue)
diff(range(marketvalue))

#Here are some descriptive statistics for my X1, profits


summary(profits)
sd(profits)
diff(range(profits))

#Here are some descriptive statistics for my X2, sales


summary(sales)
sd(sales)
diff(range(sales))

#Here are some descriptive statistics for my X3, profits


summary(assets)
sd(assets)
diff(range(assets))
```

## Interpretation and Discussion

The mean company market value was 11.8 billion dollars. The company market
values ranged from .02 billion dollars to 328.5 billion dollars. Majority of the
values lie between 2.72 billion dollars and 10.6 billion dollars. The median
market value was 5.15 billion dollars. The distribution is very skewed and does
not look symmetric.

The mean profits for companies was .3811 billion dollars. The profit values
ranged from -25.83 billion dollars to 20.96 billion dollars. Most of the values
lie between .08 billion dollars and .44 billion dollars. The median profit was
.2 billion dollars.

The mean sales for companies were 9.697 billion dollars. The sales values ranged
from .010 billion dollars to 256.3 billion dollars. Majority of the values lie
between 2.018 billion dollars and 9.548 billion dollars. The median sales were
4.365. The distribution is skewed to the right and does not look symmetric.
The mean asset for companies was 34.04 billion dollars. The asset values ranged
from 0.27 billion dollars to 22.79 billion dollars. Majority of the values lie
between 2.018 billion dollars and 9.548 billion dollars. The median asset was
9.345. The distribution is skewed to the right and does not look symmetric.

Many of the companies in the dataset are from the same category. There are 27
different categories. The category with the most counts in the dataset is
banking. The proportion of companies that are in the banking field is .1565. The
categories with other high counts are insurance and diversified financials.
A table for the counts is below.

```{r}
table1 <- table(category)
prop.table(table1)
```

The relationship between each variable is both similar and different.

Here are some plots and statistics that show the bivariate relationships between
my variables:

```{r}
#Here are some plots for the continuous variables.
plot(marketvalue~profits)
Forbes2000 <- na.omit(Forbes2000) #some values in the dataset are missing and
need to be omitted with this formula
cor(Forbes2000$profits, Forbes2000$marketvalue)

plot(marketvalue~sales)
cor(marketvalue, sales)

plot(marketvalue~assets)
cor(marketvalue, assets)

#Here is a plot for the categroical variable.


boxplot(marketvalue~category, las=2)
```

There is a moderate and positive correlation between market value and profits.
The correlation is 0.5472202. This correlation reaffirms my assumptions about a
direct relationship between market value and profits. The plot values of market
value and profits are more scattered than the other bivariate relationship
plots. This is interesting because I thought a moderate correlation between
market value and profits would provide more uniform data.

There is a moderate and clear correlation between market value and sales. The
correlation between these two variables is 0.642053. This correlation reaffirms
my assumptions about a direct relationship between market value and sales. If
company sales increase, market value of the company will likely increase.
Surprisingly, the correlation between market value and sales is stronger than
the correlation between market value and profits.
There is low to moderate correlation between market value and assets. The
correlation between these two variables is 0.4539991. This correlation also
reaffirms my assumptions about a direct relationship between market value and
assets. If company assets increase, market value of the company will likely
increase. However, market value of the company may not increase to the same
degree as it does when sales or profits increase. Perhaps higher sales/profits,
in comparison to increasing assets, more heavily influences higher market
values.

The companies with the highest market values, on average, were oil and gas
operations. Food drink and tobacco companies also had a relatively high mean.
Though business services and supplies companies had a low mean, there are
outliers in this category with very high market values. Many of the other
categories were somewhat similar in terms of market value. The oil and gas
operations and food drink and tobacco companies may have high market value means
because the products these companies produce are in high demand.

## Section III: Simple Linear Regression

Again my response variable is market value (represented by my Y variable) and my


main explanatory variable is profits (represented by x1). I expect that profits,
my predictor value, has influence on company market value. Of all my predictor
values, which also include assets and categories, I originally expected profits
to have the strongest correlation with market values.

The Linear Regression code is below:


```{r}
#for my Y and x1 variables
Fit1 <- lm(marketvalue~profits, data = Forbes2000)
summary(Fit1)
plot(Fit1)
abline(Fit1)

#for my Y and x2 variables


Fit2 <- lm(marketvalue~sales, data = Forbes2000)
summary(Fit2)
plot(Fit2)
abline(Fit2)

#for my Y and x3 variables


Fit3 <- lm(marketvalue~assets, data = Forbes2000)
summary(Fit3)
plot(Fit3)
abline(Fit3)

#for my Y and x4 variables


Fit4 <- lm(marketvalue~category, data = Forbes2000)
summary(Fit4)
plot(Fit4)
abline(Fit4)
```

The simple regression line for market values and profits has a y intercept of
9.004 and a slope of 7.5899 (market value = 9.004 + 7.5899xprofits). The y
intercept is the expected value when market value is zero. The positive slope
tells us that there is a relationship between market values and profits. The
relationship is also direct and positive. In other words, the higher the market
value the higher the profits were for companies, on average. The simple
regression line for market values and sales has a y intercept of 3.4184 and a
slope of 0.8724 (market value = 3.4184 + 0.8724xsales) . The positive slope
tells us that there is a relationship between market values and profits. The
relationship is also direct and positive. Again, the higher the market value the
higher the sales were for companies, on average. The simple regression line for
market values and assets has a y intercept of 8.085157 and a slope of 0.111407
(market value = 8.085157 + 0.111407xassests). The positive slope tells us that
there is a relationship between market values and assets. The relationship is
also direct and positive. Companies with the highest market values tend to have
the most assets. The simple regression line for market values and category has
an intercept of 12.0595 (market value = 12.0595 + the slope of a category).
Because there are many company categories, there are many slopes. Some slopes
are negative and some are positive.

Here is a hypothesis test for each of my variables:


```{r}
#for my Y and x1 variables
t.test(marketvalue, profits, level=.95)

#for my Y and x1 variables


t.test(marketvalue, sales, level=.95)

#for my Y and x2 variables


t.test(marketvalue, assets, level=.95)

```

My null hypothesis will be that there is no relationship between market values


and company profits. My alternative hypothesis will be that there is a
relationship between market values and company profits. I will repeat the same
hypothesis test for each of my explanatory variables. My next null hypothesis
will be that there is no relationship between market values and sales. My
alternative hypothesis will be that there is a relationship between market
values and sales. For the third hypothesis test, my null hypothesis will be that
there is no relationship between market values and company assets. My
alternative hypothesis will be that there is a relationship between market
values and company assets.

By using the p value for each test, we can determine whether or not the values
are significant. In this test we used a 95 percent confidence level (or a .05
significance level) which measures the probability that the value of a parameter
falls within our range of values. For the first hypothesis test, we reject the
null hypothesis because the p value is less than .05. Therefore, the market
value has a direct reltionship with profits. The hypothesis test for market
values and sales and market values and assets also had p levels less than .05.
Therefore, both sales and assets have a direct relationship with market values.
The confidence interval for my hypothesis test for market value and profits is
10.42109 to 12.57196. The confidence interval for my hypothesis test for market
value and sales is 0.8491652 to 3.5121248. The confidence interval for my
hypothesis test for market value and assets is -26.66477 to -17.66362. Because
zero is not in any of the confidence intervals, we reject the null hypothesis
for each hypothesis test.

The goodness of each model can be determined by r squared, which is the


percentage of the response variable variation that is explained by a linear
model. The multiple r squared function measures how far residuals are from the
line of best fit. The model for market values and profits has a multiple r
squared value of 0.2994. The model for market values and sales has a multiple r
squared value of 0.4122. The model for market values and assets has a multiple r
squared value of 0.2061. Of all these r squared values, sales fit the data model
best because the r squared value is closer to one.

The plots for each model help us determine the validity of our regression
assumptions. The residuals vs fitted values plot for profits is not linear. In
the second plot (normalQ-Q) majority of the values lie on the straight and
diagonal line, but many points deviate from the line on the right side of the
scatterplot. In the third plot (scale-location) there is a v shaped pattern.

The residuals vs fitted plot for sales is linear. The second plot (normal Q-Q)
shows some signs of normality. However, the points fall along a line in the
middle of the graph, but curve off in the extremities. The third plot (scale-
location) looks almost linear, unlike the scale-location plot for profits.

The residuals vs fitted plot for assets appears linear. The second plot (normal
Q-Q) shows some signs of normality. Again, the points fall along a line in the
middle of the graph, but curve off in the extremities. The third plot (scale-
location) looks almost linear, similar to the scale-location plot for sales.

The residuals vs. fitted plot for company category appears to be linear. The
normal Q-Q plot does not look normal. The third plot (scale-location) looks
linear as well.

## Section VI: Multiple Linear Regression

In this section, I will create a multiple linear regression model with market
value as my explanatory variable and profits, sales, assets, and company
category as my response variables.

Here is a multiple linear regression for all of my variables:


```{r}
#Multiple linear regression model for Y, x1, x2, and x3
Fit5 <- lm(marketvalue~profits + sales + assets + category, data = Forbes2000)
summary(Fit5)
plot(Fit5)
abline(Fit5)
```

The estimated regression line for my explanatory variable and response variables
is y= 1.079418 + 4.392926(profits) + 0.6072609(sales) + 0.051997(assets) + a
category estimate(category). The 1.079418 term represents the y intercept of
market values. The next term, 4.392926(profits), represents the slope of
profits. The 0.6072609(sales) term represents the slope of sales. The
0.051997(assets) term represents the slope of assets. So far, these slopes have
positive and direct relationships with market values. The "a category
estimate(category)" represents the slope for any certain category. Because there
are many categories/options for the category variable, we would have many lines
for this one response variable.

The multiple r squared value for this model (Fit5) is 0.6097. The multiple
linear regression model (for my Y, x1, x2, and x3 variables) has a greater
multiple r squared value than each of my simple linear regression models. This
means that the multiple linear regression model accounts for more variation.

The residuals vs fitted plot of the variables does not appear to be linear and
not all of the data follows the line. The second plot (normal Q-Q) shows some
signs of normality, but the points eventually curve off the line. The third plot
(scale location) shows a v shaped pattern. The first plot is similar to the
simple regression line for profits and company category. The second plot is
similar to all of the simple regression plots above. The third plot's v shape
pattern is similar to the v shape of profits and company category in the third
plots.

Profits, sales, and assets are all significant in regards to their relationship
with market values. However, not every category is significant for the category
variable. Some of the significant category types are trading companies, software
and service companies, drugs and biotechnology companies and more. Again, not
every company category is significant at the .05 level. Therefore, because not
every company category type is significant, I will exclude this variable (also
known as x3) from my next regression line. In the model below, I will only
include predictors that are significant at the .05 level.

Here is a model for the predictors that are significant at the .05 level:
```{r}
#Multiple Linear Regression
Fit6 <- lm(marketvalue~profits + sales + assets, data=Forbes2000)
summary(Fit6)
plot(Fit6)
abline(Fit6)
```

In the model Fit6, the intercept is 2.900394 and the slope of profits is
4.597533 and the slope of sales is 0.574732 and the slope of assets is 0.048915
(market value = 2.900394 + 4.597533(profits) + 0.574732(sales) +
0.048915(assets)). The slope of profits is steeper than that of sales. Both
sales and profits have a positive and direct relationship with market value,
though.

## Section V: Conclusions

In the end, sales seem to have the strongest correlation with company market
values. This is not what I thought would be the case. Instead, I assumed profits
would correlate best with market values. I thought companies may value profits
the most but this may not be the case. Overall, profits, sales, and assets did
have a moderate to strong correlation with market value. The strong correlation
between these variables proves that market value has a direct relationship with
each variable.
Looking at my hypothesis test, we can further see that all the variables are
significant in relation to market values. With linear regression, we measured
the relationship between two variables. R squared was the highest for sales,
which measures how close the data was to the fitted line. By using the multiple
regressions line, we see the r squared value was even higher, which means even
more of the variability was accounted for. In other words, r squared may have
been higher because all of the variables influence market values.

Overall, I was able to estimate the correlation between variables as well as


their linear relationship. Both of these things help determine which variable
has the most influence on market values. Obviously, in this case, the answer is
sales.

There may be some limitations to this study. I only chose to look at a few
response variables, which could have correlated with high market values, instead
of looking at all the variables in the dataset.

```{r}

```

You might also like