You are on page 1of 28

Using Statistics to Understand Stock Markets

Summary This term paper outlines and demonstrates use the Statistics in understanding ten year stock market prices and returns from fifty countries using basic statistics, linear regression and time series forecasting models. Keywords Term paper, Statistics, Linear Regression, Average, Logarithmic Transformation, Kurtosis, FTest for Linear Regression, T-Test for Linear regression, P-value, Partial Linear Regression, Exponential Time Series Forecasting, Moving Averages Forecasting

Page 2 of 28

Contents:

1. Abstract 2. Introduction 3. Basic Statistics: 4. Correlation 5. Linear Regression Model 5.1 5.2 5.3 5.4 5.6 5.7 5.8 5.9 5.10 Introduction. Predicting the Dependent Variable Y. Coefficient of Mutliple Determination. Test for the Significance of the Overall Multiple Regression Model. Residual Analysis. Inferences Concerning the Population Regression Coefficients. Line Plots. Further Analysis. Multiple Linear Regression Conclusions.

6. Forecasting 6.1 6.2 6.3 Models. Forecast Error Measures. Charts for Forecast Models.

7. Conclusion Appendix A. References. B. List of Excel files used for calculations.

Page 3 of 28

1. Abstract In this paper we demonstrated the use of statistical concepts to understand stock market returns for fifty different countries. The data is monthly closing prices of MCSI country index in US dollars. We created basic statistics for the set of fifty countries stock market monthly returns. We showed usage of linear regression to relate returns of Finland first to the stock markets of the top seven economies of the world, then to seven bigger neighbors and in the end to eight other important economies. In the end we demonstrated the use of time series prediction to predict returns of Austrian stock index prices. Excel and PHStat were tools of our choice. 2. Introduction Most of us have some relationship with the stock market. We either have direct stock holdings or have indirect stock holdings through mutual funds in retirement plans. Stock markets are where investors invest in companies big and small. Stock market is a direct indicator economical environment. Hence understanding stock markets is helpful for managers while making investment and capital expenditure decisions. Typically stock brokers, stock investors, government officials and banks are interested in stock market data and analysis. We had with us monthly MSCI index prices for fifty countries. The index prices are not distributed normally but the returns are. It is not preferable to work directly with the price series for performing any statistical analysis. The raw price series are converted into series of returns. Additionally, returns have the added benefit that they are unit-free and currency free, allowing comparisons to be done across markets. There are two methods used to calculate returns from a series of prices, and these involve the formation of simple returns, and continuously compounded returns, which are achieved as follows: Simple returns =(( )/( Continuously compounded returns: where: denotes the simple return at time t denotes the continuously compounded return at time t denotes the asset price at time t ln denotes the natural logarithm. We used continuously compounded returns using logarithm transformation for our analysis. We used Excel and PHStat as our tools to do our Statistical analysis. Page 4 of 28 ) ) 100% = 100% ln (

3. Basic Statistics: We found out which countries on average gave most returns over ten year period. Countries Colombia, Czech Republic, Austria and Denmark were on the top four with 18.68%, 18.22%, 17.98% and 14.89% annual compounded returns for the nine plus year period of 10/31/1997 to 7/31/2007. We were able to easily identify that while Colombia, Czech Republic, Austria and Denmark gave investors the biggest returns, it was Denmark and Austria which provided lesser volatility among these 4 countries. The standard deviation of Denmark and Austria were 17.44 and 23.99 compared to 32.16 and 40.72 for Czech Republic and Columbia. The chart below shows the average rate of returns against the standard deviation.

Investing in the stock market always bears some risk, large or small, depending on the volatility of the stock price. A stock that has large volatility may make give higher or negative returns depending on when the investor enters and exits out of the stock holdings. We found that Russia, Turkey, Indonesia has highest volatility whereas USA, United Kingdom and Canada are mostly stable (as of Sep 2007). We made an attempt to understand the statistical measure Kurtosis. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly sized deviations. Kurtosis measures whether the data is sharp or flat relative to a normal distribution. Since Kurtosis measures the shape of the distribution (the fatness of the tails), it focuses on how returns are ranged Page 5 of 28

around the mean. A Kurtosis coefficient of three indicates a normal distribution. Kurtosis of less than three indicates a low peak with a fat midrange on either side (platykurtic). Conversely, Kurtosis greater than three indicates a sharp/high peak with a thin midrange and fat tails (leptokurtic). Therefore, to put simply, Kurtosis describes how bunched around the center or spread at the endpoints a frequency distribution is. Sometimes Kurtosis is also called "the volatility of volatility." The chart below shows kurtosis in increasing order against average monthly returns with its standard deviation. It is Russia which has the largest standard deviation and Kurtosis, meaning Russia stock market is most volatile among 50 countries.

4. Correlation We know that relationship between two variables is expressed through correlation. We used Excel to create a matrix of 50 X 50 correlation relationship with Excel which is showed in the table. We found that Netherlands and France had the value of 0.886035, which was one of the highest correlations. In the other hand, we found Pakistan and Denmark with correlation value of -0.19975. Any investment requires diversification so that the investing risks are spread among unrelated investments. Correlation could be good tool to find diversification. A quick glance at the correlation matrix table identify on how to diversify. For example investment in France and Netherlands, with correlation coefficient of 0.886, will lead to same type of returns or risks.

Page 6 of 28

5. Linear Regression Model Introduction We picked Finland for creating linear regression models. Finland is 33rd among the top 50th economies of the world. The top 7 economies are: United States China Italy Japan United Kindom Germany France

(Source: The Economist Pocket World Figures, 2009 edition) Multiple regression Model with k independent variables is given as:

In our case we have 7 independent variables, so the model to be developed is:

Here , for j= 1 to 7, are monthly rate of returns for the seven biggest countries of the world. is the estimated rate of return for Finland for which the linear model was to be created. We first used Microsoft Excel to compute the values of the eight regression coefficients. Output of Microsoft Excel Multiple Regression Analysis

From above figure, the computed values of the regression coefficients are

Page 7 of 28

Therefore the multiple regression equation is:

The sample Y intercept ( ) estimates the return of Finland stock market when returns of all other seven stock markets are zeroes. Because the stock returns cannot be zeroes for all markets at same time, the value of has no practical interpretation. The slope of rate of return with US rate indicates that for a given amount of rate of return for US, the Finland rate of return is going to decrease by 0.14045 times. The estimates of all allowed us to better understand the effect of the rate of returns of biggest seven economies on Finland. Regression coefficients in multiple regression are called net regression coefficients. They estimate the mean change in Y per unit change in a particular X, holding constant the effect of other X variables. Predicting the Dependent Variable Y We used the multiple regression equation to predict the value of the dependent variable. We took monthly rate of return from Feb 27, 1998 for all seven countries and found that model predicted range of 1.43428 to 8.31292 at 95% confidence level. The actual value of dependent variable or Finlands monthly rate of return was 9.80996. We used PHSTATs Confidence interval estimate and prediction interval function to arrive at range, shown in the table below. The table below shows expected the output of Finlands monthly return at 95% confidence level on Feb 27, 1998 with given rates of returns for the seven countries:
Data Confidence Level 95% USA given value 6.287008 1.1237 JAPAN given value GERMANY given value 4.412191 30.57856 CHINA given value 5.309952 UNITED KINGDOM given value FRANCE given value 8.189638 6.306493 ITALY given value For Average Predicted Y (YHat) Interval Half Width 3.439322 Confidence Interval Lower Limit 1.434281 Confidence Interval Upper Limit 8.312925

Page 8 of 28

Coefficient of Mutliple Determination The coefficient of multiple determination is equal to the regression sum of square (SSR) divided by the total sum of squares (SST): = Regression Sum of Squares / Total Sum of Squares = SSR /SST For Finlands monthly rate of return we have =3299.203 / 7011.618 = 0.47053 Excel also created the same result for us when we did regression analysis:
SUMMARY OUTPUT Regression Statistics Multiple R 0.685955 0.470534 R Square Adjusted R Square 0.43684 Standard Error 5.809409 Observations 118

The coefficient of multiple determination ( = 0.47053) indicates that 47.05% of the variation in Finlands rate of return is explained by the rate of returns of the seven biggest economies. One would have expected higher correlation but this is not the case here. We also looked at PHStat output where removing one of the each time.
Condition All but Italy All but France All but United Kingdom All but China All but Germany All but Japan All but USA

for was calculated six times,

Which Variable Removed 0.47041 0.46662 0.46382 0.47012 0.43121 0.44104 0.46883

Page 9 of 28

Test for the Significance of the Overall Multiple Regression Model We performed the significance test of the overall multiple regression model using F-test. Here we tried to find if there is a significant relationship between the dependent variable and the entire set of independent variables. Since there is more than one independent variable, we used the following null and alternate hypothesis: = 0 (There is no linear relationship between the dependent variable and the independent variables.)

: At least one 0, j= 1, 2, .7 (There is linear relationship between the dependent variable and at least one of the independent variables.)
The overall F Test Statistic is equal to the regression mean square (MSR) divided by the error mean square F = MSR/MSE Where: F= test statistic from an F distribution with k and n k 1 degrees of freedom k=number of independent variables in the regression model n=number of samples uses to create the regression model ANOVA table for our model
Significance F 7.54666E-13

Df Regression Residual Total 7 110 117

SS 3299.203009 3712.415413 7011.618422

MS 471.3147155 33.74923103

F 13.96519865

The decision rule is to reject

at the level of significance if F >

otherwise, do not reject . Using a 0.05 level of significance, the critical value of the F distribution with 8, 109 (118 -8 -1) degrees of freedom found from F tables is approximately 2.02. From figure above the F statistic is 13.9652. Because 13.9652 > 2.02, we rejected and found statistical proof to conclude that at least one of the independent variables (rates of returns of seven biggest economies) is related to Finland rate of return. Figure showing Significance of the Overall Multiple Regression Model F-Test

Page 10 of 28

Residual Analysis We evaluated the appropriateness of using the multiple linear regression model using residual analysis. We created the seven residual plots using Excel along with residuals for expected Y. From these charts we saw that the pattern is random for all the charts and use linear regression was appropriate in this case.

Page 11 of 28

Inferences Concerning the Population Regression Coefficients To determine the existence of a significant linear effect on y (Finlands rate of return) and independent variable (the monthly rate of return of one of the biggest seven economies) the null and the alternate hypotheses are:

: :

(There is no linear relationship) (There is a linear relationship)

The t-statistic equals the difference between the sample slope and the hypothesized value of the population slope divided by the standard error of the slope: t=( where = slope of variable j with Y, holding constant the effects of the other independent variables. t k = Standard Error of the regression coefficient = test statistic for a t distribution with n k 1 degrees of freedom = number of independent variables = hypothesized value of the population for variable j, holding constant the effects of the other independent variables.

)/(

Page 12 of 28

The table below summarized our findings. We used 95% confidence levels, and for p-value > 0.05 null hypothesis was accepted.
Is t-stat in area of nonrejection Yes No No Yes Yes Yes Yes

Country USA JAPAN GERMANY CHINA UNITED KINGDOM FRANCE ITALY

t Stat 0.59559 2.47559 2.85814 0.27974 1.18083 0.90193 0.15824

Critical t 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799

Null Hypothesis Accepted Rejected Rejected Accepted Accepted Accepted Accepted

p-value 0.552673 0.014826 0.005098 0.780206 0.240219 0.369066 0.874559

> 0.05 Yes No No Yes Yes Yes Yes

We found that only Japan and Germany have the biggest contribution to Y value or Finlands rate of return.

We decided to perform PHStat Stepwise Regression to confirm the findings above. We found that the PHStats Forward Selection Function Best Model Fit selected only Germany and Japan as significant dependent variables. This confirmed what we found earlier. So the liner equation reduced to:

PHStat Stepwise Regression Analysis Table Finland and 7 biggest economies Table of Results for Forward Selection GERMANY entered. SS MS F 1 2745.549597 2745.549597 74.65509029 116 4266.068825 36.77645538 117 7011.618422 Coefficients Standard Error t Stat P-value -0.034545387 0.562181166 0.061448851 0.951107497 0.730094046 0.084498518 8.640317719 3.55615E-14 df Significance F 3.55615E-14

Regression Residual Total

Lower 95%

Upper 95%

Intercept GERMANY JAPAN entered.

1.148015985 1.078925211 0.562734089 0.897454003

Page 13 of 28

SS MS F Regression 2 3150.704997 1575.352499 46.92297325 Residual 115 3860.913425 33.57316021 Total 117 7011.618422 Coefficients Standard Error t Stat P-value Intercept -0.117655288 0.537672497 0.218823333 0.82717555 GERMANY 0.66036799 0.083192301 7.937849769 1.51106E-12 JAPAN 0.351600388 0.101212614 3.473879136 0.000724094

df

Significance F 1.2593E-15

Lower 95%

Upper 95%

-1.18268099 0.947370414 0.495580058 0.825155923 0.151117685 0.55208309

No other variables could be entered into the model. Stepwise ends.

Line Plots We created line plots for Finland for each of independent variables:

Page 14 of 28

Further Analysis At point we wondered if the Finlands stock market was correlated more to the bigger economies in its neighborhood. We picked following countries to do further analysis: Page 15 of 28

Country Russia Netherlands Belgium Sweden

Economic Rank 11 16 18 19

Country Poland Norway Denmark

Economic Rank 24 25 28

We obtained following Excel output:

Here =0.596 told us that Finlands stock market has better linear relationship to its seven neighboring countries than the biggest seven economies of the world. Again with a 0.05 level of significance, the critical value of the F distribution with 8, 109 (118 -8 -1) degrees of freedom found from F tables is approximately 2.02. The F statistic is 23.1811. Because 23.1811 > 2.02, we rejected and found enough statistical proof to conclude that at least one of the independent variables (rates of returns of seven neighboring economies) is related to Finlands rate of return.

Hence the multiple linear regression equation is:

Page 16 of 28

Again we created the t-test table and found that only Sweden and Poland have the biggest contribution to Y value or Finlands rate of return.

Country RUSSIA NETHERLANDS BELGIUM SWEDEN POLAND NORWAY DENMARK

t Stat 0.50122106 -0.1346789 0.87458724 4.76688076 4.229502 -0.3460091 -0.672525

Critical t 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799

Is t-stat in area of nonrejection Yes No No Yes Yes Yes Yes

Null Hypothesis Accepted Accepted Accepted Rejected Rejected Accepted Accepted

p-value 0.617217 0.893112 0.383704 5.77E-06 4.86E-05 0.729997 0.50266

> 0.05 Yes Yes Yes No No Yes Yes

With this new insight, we changed the linear equation to:

Once again we performed PHStat Stepwise Regression to confirm the findings and we found the same, that Finland is more related to Sweden and Poland.

Page 17 of 28

The findings above got us curious. We wanted to see if Finlands rate of return had any linear relationship with other countries in Asia Pacific and Latin America. We also decided kept four countries which had linear relationship in our earlier analysis and added other eight which were not considered before. These countries were picked (the ones in bold were picked before): ARGENTINA AUSTRALIA BRAZIL INDONESIA JAPAN MEXICO CHINA POLAND GERMANY SWEDEN INDIA SINGAPORE

We obtained following Excel output:

Page 18 of 28

We found that =0.62 told us that Finlands stock market has better linear relationship to twelve countries picked in the list. Again with a 0.05 level of significance and the F statistic being 14.715 which is greater than 2.02, we rejected and found statistical proof to conclude that at least one of the independent variables (rates of returns from twelve countries) is related to Finland rate of return.

Page 19 of 28

We created the t-test table and found that only Sweden, Poland, China and Australia had linear relationship to Y value or Finlands rate of return.

Country

t Stat
-0.52605 2.03623 0.627811 -2.08348 0.649476 0.892593 0.109509 0.422115 0.127667 3.339558 3.297052 0.282879

ARGENTINA AUSTRALIA BRAZIL CHINA GERMANY INDIA INDONESIA JAPAN MEXICO POLAND SWEDEN SINGAPORE

Critical t -1.9799 1.9799 1.9799 -1.9799 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799 1.9799

Is t-stat in area of nonrejectio n Yes No Yes No Yes Yes Yes Yes Yes No No Yes

Null Hypothesis Accepted Rejected Accepted Rejected Accepted Accepted Accepted Accepted Accepted Rejected Rejected Accepted

p-value
0.59996518 0.044244445 0.531490492 0.039635123 0.517449239 0.374116105 0.913007487 0.673804307 0.89865689 0.001162327 0.001333794 0.777826147

> 0.05
Yes No Yes No Yes Yes Yes Yes Yes No No Yes

Page 20 of 28

Again PHStat Stepwise Regression confirmed the same findings and we found the same that Finland is related linearly to Sweden, Poland, China and Australia.

Multiple Linear Regression Conclusions We found that regression can be very good tool to model stock market returns and find relationship among different market returns. With any statistical analysis there is always going to be uncertainty, and this needs to be kept in mind while making all investing decisions. We were able to find that Finlands rate of return was related to rates of returns of these countries: Sweden, Poland, China and Australia. In other way to understand this will be that these five countries present similar investment risks.

Page 21 of 28

6. Forecasting We exercised forecasting modeling techniques with the Austrias stock index prices. First we plotted the monthly rate of return against time.

Here we found the monthly returns are non seasonal has no consistent upward and downward trend. In this case the use of exponential and moving

average smoothing models for forecasting purposes was most appropriate.


Models We used following forecasting techniques to create our forecasting modes: 1. First Order Nave: 2. 2nd Order Nave: 3. 3 Period Moving averages: 4. 4 Period Moving averages: 5. 5 Period Moving averages: 6. Exponential Smoothing Forecast with = 0.758: 7. Exponential Smoothing Forecast with = 0.2

Page 22 of 28

Forecast Error Measures We created these seven models using Excel. After creating forecasted series we calculated Forecast Error Measures. These are given as: Bias

Average error Variability:

t 1

et n t 1 n 1 et
n 2

Mean squared error

MSE

Standard deviation

MSE

Mean absolute error

MAD

e
t 1

n 1

We calculated the error measures and here they are: 2nd NAIV Error

Date Average Error

FNAIV Error

F_MA_3 Error

F_MA_4 Error

F_MA_5 Error

EXP_0.758 Exp_0.2 Error Error -0.00982 73.89865 8.596432 6.166996 0.237657 54.27465 7.367133 5.396453

MSE SE MAD

-0.11329 -0.08389 -0.08465 -0.11619 -0.25684 91.58189 91.62089 63.3329 58.75387 57.74957 9.569843 9.57188 7.958197 7.665107 7.599314 7.172345 7.077884 5.530948 5.438287 5.551451

We found that the exponential Smoothing Forecast with = 0.758 had least bias with average error of -0.00982, whereas exponential Smoothing Forecast with = 0.2 had least variability. The Mean absolute error (MAD) was least for the 4 Period Moving averages model but exponential Smoothing Forecast with = 0.758 also had MAD close to the moving averages model. Page 23 of 28

We created charts for all these models with actual vs forecasted models to visually show the different models used for forecasting.

Charts for Forecast Models First Order Nave Forecast Model Chart (Blue Actual, Red Forecasted)

Second Order Nave Forecast Model Chart (Blue Actual, Red Forecasted)

3 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)

Page 24 of 28

4 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)

5 Period Moving Averages Forecast Model Chart (Blue Actual, Red Forecasted)

Exponential Smoothing Forecast Chart (with =0.758, Blue Actual, Red Forecasted)

Page 25 of 28

Exponential Smoothing Forecast Chart (with =0.2, Blue Actual, Red Forecasted)

7. Conclusion In this paper we used different Statistical aspects to analyze international stock market returns. The study as such could be very exhaustive, but in our limited scope, we successfully demonstrated the use of basic Statistics, Linear Regression and Time Forecasting.

Page 26 of 28

Appendix A. References 1. Levine, Stephan, Krehbiel, Berenson Statistics for Managers Using Microsoft Excel, 5th Edition, 2008. 2. Anderson, Sweeney Williams, Essentials of Modern Business Statistics, 4th Edition, 2009 3. Statistical terms at http://en.wikipedia.org/ 4. Class notes. B. List of Excel files used calculation 1. country_data_in_pc.xlsm is the main excel file with following worksheets: a. The base data for fifty countries monthly rate of return is in HistoryIndex worksheet. b. MarketReturns worksheet transforms HistoryIndex worksheet to log returns in percentages using formula =100*LN(HistoryIndex!B11/HistoryIndex!B10) It also has following basic statistics for each country: Monthly Avreage Return Monthly Variance Monthly Std. Deviation Yearly Avg Return Yearly Std. Dev Coefficient of Variance Skewness Kurtosis c. CorrelationMatrix worksheet has the 50 X 50 correlation matrix table for 50 countries. d. Histograms worksheet has histograms for all countries rate of returns including their frequency tables. e. Worsheet BasicStatsUsingExcel contains once again basic statistics but calculated using Data Analysis Descriptive Statistics Tools, instead of using function and calculations for each of statistical measures in b. f. Line diagram for average yearly rates of return with standard deviation is in YRLY_RETURN_CHART_BY_COUNTRIES worksheet. g. Kurtosis_Chart workseet has monthly Kurtosis measure plotted with monthly average returns and standard deviation. h. List of all countries in one column is in CountryNames work sheet.

Page 27 of 28

2. Correlationmatrix.xlsx has 50 X 50 correlation matrix table for 50 countries CorrelationMatrix. 3. Finland_n_other_7_big_economies.xlsx has Finlands regression calculation along with residual plots and line diagrams in the first worksheet Finland_Regression_Model. Other work sheets contain output from PHStat. 4. File finland_vs_7_Neighboring _big_countries.xlsx has Finlands regression calculations with seven biggest neighboring economies. MR worksheet has linear regression model. Other work sheets contain output from PHStat. Note Stepwise worksheet has PHStat Stepwise Regression output. 5. File Finland_and_other_tweleve_countries.xlsx has Finlands regression calculations with 12 other major economies. MR worksheet has linear regression model. Other work sheets contain output from PHStat. Note Stepwise worksheet has PHStat Stepwise Regression output. 6. Austria_time_forecasting.xlsm has time series modeling output. These are the worksheets in the excel file: a. Worksheet Main has the basic time series and columns showing all time series models with values and errors. b. Basic Charts worksheet has Austrias stock price and returns plotted against time. c. Worksheets FNAIV, FNAIV, F_MA_3, F_MA_4, F_MA_5, EXP_0.758 and EXP_0.2 are used to calculate Avg Error, MSE, SE and MAD

Page 28 of 28