You are on page 1of 16

Executive Summary

Let us start with explaining the concept of Data Analysis. Data Analysis is a field of applied
statistics which transforms numbers into meaningful information for decision makers.
Decision makers rely on Data Analysis as it reduces the degree of uncertainty which is
inherent in any decision making process.

There are three broad divisions of Data Analysis. The first is known as Descriptive Analysis,
the second is the Inferential Analysis and the third is the Predictive Analysis. According to
some Data Scientists, there is another division of Data Analysis which is known as the
Prescriptive Analysis. But, in most of the situations, the Prescriptive Analysis is included as a
sub part of the Predictive Analysis.

Descriptive Analysis, the first division of Data Analysis, involves describing past or historical
data. This can be done in two different ways: a) Data Representation or Visualization and b)
Numerical Data Descriptors.

a) Data Representation or Visualization is representing collected past or historical


data with the help of suitable tables and graphs.
b) Numerical Data Descriptors involve describing data with the measurement of
Central Tendency and Dispersion.

Inferential Analysis is drawing inferences or conclusions regarding population, based on


analysis or tests done on the samples, drawn from that population.

Predictive Analysis is the process of predicting the future value of a variable on the basis of
some given values of another variable or variables. There are two ways of executing
Predictive Analysis; Prediction done through Regression and Prediction done through
Forecasting. While Regression is prediction with Cross Sectional Data, Forecasting is
prediction with Time Series Data. Moreover, there are generally four different Regression
Models – a) Simple Linear Regression Model (SLRM), b) Simple Non-Linear Regression
Model (SNLRM), c) Multiple Linear Regression Model (MLRM) and d) Multiple Non
Linear Regression Model (MNLRM).

Simple Linear Regression Model is the method of prediction with one predicted variable and
one predictor variable, the relationship between the two can be expressed by linear equation.
Simple Non Linear Regression Model is the method of prediction with one predicted and one
predictor variable but the relationship between the variables are to be expressed through
nonlinear equations. Multiple models involve on predicted variable but more than on
predictor variables. Regression generally is preceded by measuring the degree of association,
between the predicted and predictor variables, with coefficient of correlation.

In this assignment, referring to Option B, we have collected data on GDP, population,


unemployment rate, money supply – M1, Consumer Price Index, and Federal Funds Effective
Rate (representing prime interest rate), from the FRED Economic Data of the Federal Bank
of St. Louis. The variables have been represented by appropriate graphical representations.
Detailed Descriptive Analysis of each of the variables have been done and explained in the
assignment. Before construction of the regression model, correlation matrix between the
dependent variable GDP and the independent variables – population, unemployment rate,
money supply, consumer price index and Federal Fund Effective Rate has been constructed.
Ranking of the independent variables based on their degree of association with the dependent
variable has been done and instances of multicollinearity have been identified from the
correlation matrix. Accordingly, two regression models have been constructed and the better
fit model has been identified. The assignment ends with the conclusions and the findings.

Objectives of the Research

This assignment has the following objectives:

i) To collect appropriate data on GDP, population, unemployment rate, money


supply, Consumer Price Index and prime interest rate
ii) To represent the collected data in appropriate tabulation and graphical
representation
iii) To calculate the numerical descriptive measures of the collected data
iv) To measure the degree of association between GDP, population, unemployment
rate, money supply, Consumer Price Index and prime interest rate
v) To develop a regression model with GDP as the dependent variable and
population, unemployment rate, money supply, Consumer Price Index and the
prime interest rate as independent variables
vi) To draw conclusions and recommend on the basis of analyzing the collected data
and the findings
Data Collection

Data on GDP from 1975 to 2022 annual values, population in thousands from 1975 to 2022
annual values, unemployment rate in percentage from 1975 to 2022 annual values, money
supply – M1 in billions of dollars from 1975 to 2022 annual values, sticky price consumer
index from 1975 to 2022 annual values, and Federal Funds Effective Rate in percentage from
1975 to 2022 annual values have been collected from FRED Economic Data of the Federal
Bank of St. Louis. The variables have been labelled in the MS Excel file as GDP (GDP from
1975 to 2022 annual values), POP (population in thousands from 1975 to 2022 annual
values), UNRATE (unemployment rate in percentage from 1975 to 2022 annual values), M1
(money supply – M1 in billions of dollars from 1975 to 2022 annual values), CPI (sticky
price consumer index from 1975 to 2022 annual values), and FFER (Federal Funds Effective
Rate in percentage from 1975 to 2022 annual values).

This is the fulfilment of the objective 1 stated above.

Data Representation through Tables and Charts

1. Name of the variable: GDP

GDP
30000

25000

20000

15000

10000

5000

0
75 77 79 81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13 15 17 19 21
19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20

The above clustered column chart shows the GDP values as compared with time period
captured in years.

GDP
Mean 10492.09
Standard Error 947.6516
Median 9346.995
Mode #N/A
Standard
Deviation 6565.523
Sample Variance 43106094
Kurtosis -0.8619
Skewness 0.470668
Range 23777.81
Minimum 1684.91
Maximum 25462.72
Sum 503620.6
Count 48

The above table gives the numerical descriptive measures of the variable – GDP. The mean is
10492.09 and the median value is 9346.995. The GDP data records a standard deviation of
6565.523 with a very small positive value of skewness.

2. Name of the variable: Population

Population
400000
350000
300000
250000
200000
150000
100000
50000
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

Year POP

The above clustered column chart shows the Population values as compared with time period
captured in years expressed in numbers.

POP
Mean 276680.021
Standard Error 5444.99129
Median 277741
Mode #N/A
Standard
Deviation 37724.0063
Sample Variance 1423100648
Kurtosis -1.3694262
Skewness -0.0251117
Range 117614
Minimum 215981
Maximum 333595
Sum 13280641
Count 48

The above table gives the numerical descriptive measures of the variable – Population. The
mean is 276680.021 and the median value is 277741. The Population data records a standard
deviation of 37724.0063 with a very small negative value of skewness.

3. Name of the variable: Unemployment Rate(UNRATE)

UNRATE
12

10

0
1970 1980 1990 2000 2010 2020 2030

The above scatter plot line chart shows the Unemployment rate values as compared with time
period captured in years.

UNRATE
Mean 6.2625
Standard Error 0.238524
Median 5.95
Mode 6.1
Standard 1.652545
Deviation
Sample Variance 2.730904
Kurtosis -0.57538
Skewness 0.45808
Range 6.1
Minimum 3.6
Maximum 9.7
Sum 300.6
Count 48

The above table gives the numerical descriptive measures of the variable – Unemployment
Rate The mean is 6.2625 and the median value is 5.95. The Unemployment rate data records
a standard deviation of 1.65 with a small positive value of skewness.

4. Name of the variable: Money Supply – M1

M1
25000

20000

15000

10000

5000

0
75 77 79 81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13 15 17 19 21
19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20

The above clustered bar chart shows the M1 values as compared with time period captured in
years.

M1
Mean 2343.04
Standard Error 600.0716
Median 1124.05
Mode #N/A
Standard 4157.418
Deviation
Sample Variance 17284124
Kurtosis 13.18942
Skewness 3.661078
Range 20129.5
Minimum 281.4
Maximum 20410.9
Sum 112465.9
Count 48

The above table gives the numerical descriptive measures of the variable – M1. The mean is
2343.04 and the median value is 1124.05. The M1 data records a standard deviation of
4157.418 with a relatively large positive value of skewness.

5. Name of the variable: Consumer Price Index (CPI)

CPI
16

14

12

10

0
75 77 79 81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13 15 17 19 21
19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20

The above stacked area chart shows the CPI values as compared with time period captured in
years.

CPI
Mean 4.143333
Standard Error 0.395633
Median 3.16
Mode 2.76
Standard 2.741024
Deviation
Sample Variance 7.513214
Kurtosis 3.125893
Skewness 1.788673
Range 13.05
Minimum 0.84
Maximum 13.89
Sum 198.88
Count 48

The above table gives the numerical descriptive measures of the variable – CPI. The mean is
4.14 and the median value is 3.16. The CPI data records a standard deviation of 2.74 with a
relatively moderate positive value of skewness.

6. Name of the variable: Federal Funds Effective Rate (FFER)

FFER
12

10

0
75 77 79 81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13 15 17 19 21
19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20

The above line area chart shows the FFER values as compared with time period captured in
years.

FFER
Mean 5.539375
Standard Error 0.222259
Median 5.3
Mode 5.3
Standard 1.539855
Deviation
Sample Variance 2.371155
Kurtosis 0.51501
Skewness 0.946962
Range 6.33
Minimum 3.27
Maximum 9.6
Sum 265.89
Count 48

The above table gives the numerical descriptive measures of the variable – FFER. The mean
is 5.53 and the median value is 5.3. The FFER data records a standard deviation of 1.53 with
a positive value of skewness.

This is the fulfilment of the objectives 2 and 3 stated above.

Data Analysis: In the following sections we present the Data Analysis.

Coefficient of Correlation

The coefficient of correlation measures the degree of association between variables. It is


denoted by r. The following interpretations of the value of r:

i) The value of r ranges between -1 and +1.


ii) If the value of r is positive, then the values of the variables move in the same
direction, if the value of one variable increases then the value of the other
increases and vice versa. This is captured form the collected past data.
iii) If the value of r is negative, then the values of the variables move in the opposite
direction, if the value of one variable increases then the value of the other
decreases and vice versa. This is captured form the collected past data.
iv) If the value of r is 0, the there is no relationship between the variables.
v) If the value of r is -1, then the variables are perfectly negatively correlated.
vi) If the value of r is +1, then the variables are perfectly positively correlated.
vii) If the value of r ranges between -1 and-0.5, then the variables are strongly
negatively correlated.
viii) If the value of r ranges between -0.5 and 0, then the variables are weakly
negatively correlated.
ix) If the value of r ranges between 0 and + 0.5, then the variables are weakly
positively correlated.
x) If the value of r ranges between +0.5 and + 1, then the variables are strongly
positively correlated.

We now proceed to measure the coefficient of correlation of the variables – GDP, Population,
Unemployment Rate, Money Supply – M1, Consumer Price Index, Federal Funds Effective
Rate.

GDP POP UNRATE M1 CPI FFER


GDP 1
POP 0.980345297 1
UNRAT
E -0.358531024 -0.352039937 1
M1 0.677178258 0.553873993 -0.216450151 1
CPI -0.642673888 -0.737430946 0.231142901 -0.18421872 1
FFER 0.188727824 0.251585786 0.236428786 -0.088051548 -0.247568867 1

The above table is the correlation coefficient matrix. This matrix is applicable for multiple
regression models. The correlation coefficient matrix for multiple regression models gives us
two important dimensions.

a) Ranking of the independent variables on the basis of their influence on the dependent
variable.
b) High degree of association measured by correlation coefficient between independent
variables (the case of multicollinearity).

By applying the above two concepts the results for the present problem are presented below:

a) Ranking of the independent variables on the basis of their influence on the dependent
variable – the dependent variable, as given below:
i) Population: The value of r between GDP and Population is + 0.9803. This
means as the population increases, GDP also increases.
ii) Money Supply – M1: The value of r between GDP and M1 is + 0.6771. This
shows with the rise in M1, the GDP rises.
iii) Consumer Price Index – CPI: The value of r between GDP and CPI is –
0.6426. This shows that the GDP increases with the fall in CPI.
iv) Unemployment rate – UNRATE: The value of r between GDP and UNRATE
is – 0.3585. This means that GDP increases with the decrease in
unemployment rate.
v) Federal Fund Effective Rate: The value of r between GDP and FFER is +
0.1887

b) Instances of multicollinearity:
i) The value of correlation coefficient is high between the two independent
variables – population and consumer price index, CPI; r = - 0.7374. So these
two variables cannot be taken together in the predictive model for GDP.
ii) The value of correlation coefficient is high between the two independent
variables – population and money supply, M1; r = + 0.5538. So these two
variables cannot be taken together in the predictive model for GDP.

This is the fulfilment of the objective 4 stated above.

The Predictive Model: Regression between GDP and the independent variables:

Since the regression model between GDP and the independent variables involves one
dependent variable and more than one independent variables, so we would propose MLRM –
multiple linear regression model. Moreover, since there are high correlations between the
independent variables CPI and Population and M1 and Population, we cannot take these
variables together in the same regression model. Therefore, we propose two regression
models as described below:

a) Regression Model 1: Dependent Variable – GDP; Independent Variables –


Population, Unemployment rate, Federal Fund Effective Rate.
b) Regression Model 2: Dependent Variable – GDP; Independent Variables –
Unemployment rate, money supply – M1, consumer price index – CPI, Federal Fund
Effective Rate – FFER.

We would accept the Regression Model which has a better value of acceptance, measured
through Adjusted R Square.

Regression Model 1:
The proposed regression equation is:

Y = b0 + b1(Population; POP) + b2(Unemployment rate; UNRATE) + b3(Federal Fund


Effective Rate; FFER) + error

GDP = b0 + b1(Population; POP) + b2(Unemployment rate; UNRATE) + b3(Federal Fund


Effective Rate; FFER) = error

b0 is the intercept coefficient, b1 is the population coefficient, b2 is the unemployment rate


coefficient, b3 is the Federal Fund Effective Rate coefficient.

The regression results are shown in the following table:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.98219956
R Square 0.964715976
Adjusted R Square 0.962310247
Standard Error 1274.620741
Observations 48

ANOVA
df SS MS F Significance F
Regression 3 1954501467 6.52E+08 401.0077662 5.89145E-32
Residual 44 71484953.5 1624658
Total 47 2025986421

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -36335.21372 1873.761899 -19.3916 3.50258E-23 -40111.53269 -32558.89475 -40111.53269 -32558.89475
POP 0.173996859 0.005663338 30.72338 2.28677E-31 0.162583152 0.185410566 0.162583152 0.185410566
UNRATE 34.78511547 128.7742474 0.270125 0.788327234 -224.7423272 294.3125581 -224.7423272 294.3125581
FFER -276.563282 133.6502722 -2.06931 0.044421726 -545.9177068 -7.208857156 -545.9177068 -7.208857156

The value of b0 = - 36335.2137, b1 = 0.17399, b2 = 34.78511, and b3 = - 276.5632

Therefore, the regression equation is

GDP = - 36335.2137 + 0.17399(Population; POP) + 34.78511(Unemployment rate;


UNRATE) – 276.5632(Federal Fund Effective Rate; FFER)

The value of Adjusted R Square is 0.9623. This means that 96.23% of the variations in GDP
can be explained by the variations in Population, Unemployment rate and the Federal Fund
Effective Rate.
Regression Model 2:

The proposed regression equation is:

Y = b0 + b1(Unemployment rate, UNRATE) + b2(Money Supply; M1) + b3(Consumer Price


Index; CPI) + b4(Federal Fund Effective Rate; FFER) + error

GDP = b0 + b1(Unemployment rate, UNRATE) + b2(Money Supply; M1) + b3(Consumer


Price Index; CPI) + b4(Federal Fund Effective Rate; FFER) + error

b0 is the intercept coefficient, b1 is the unemployment rate coefficient, b2 is the money


supply coefficient, b3 is the consumer price index coefficient and b4 is the Federal Fund
Effective Rate coefficient.

The regression results are shown in the following table:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.878834988
R Square 0.772350937
Adjusted R Square 0.75117428
Standard Error 3275.042734
Observations 48

ANOVA
df SS MS F Significance F
Regression 4 1564772510 3.91E+08 36.47180641 2.67218E-13
Residual 43 461213911.1 10725905
Total 47 2025986421

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 13198.87392 2573.136667 5.129488 6.61687E-06 8009.649271 18388.09856 8009.649271 18388.09856
UNRATE -668.6147783 316.1074389 -2.11515 0.040250045 -1306.106184 -31.12337222 -1306.106184 -31.12337222
M1 0.901780569 0.119368967 7.554565 2.04422E-09 0.661050105 1.142511033 0.661050105 1.142511033
CPI -1096.096837 191.5968606 -5.72085 9.30019E-07 -1482.488731 -709.7049431 -1482.488731 -709.7049431
FFER 705.676709 338.329473 2.085768 0.042968562 23.37030012 1387.983118 23.37030012 1387.983118

The value of b0 = 13198.8739, b1 = - 668.6147, b2 = 0.9017, b3 = - 1096.0968 and b4 =


705.6767

Therefore, the regression equation is

GDP = 13198.8739 – 668.6147(Unemployment rate, UNRATE) + 0.9017(Money Supply;


M1) – 1096.0968(Consumer Price Index; CPI) + 705.6767(Federal Fund Effective Rate;
FFER)

– 276.5632(Federal Fund Effective Rate; FFER)


The value of Adjusted R Square is 0.7511. This means that 75.11% of the variations in GDP
can be explained by the variations in Unemployment rate, Money supply, Consumer Price
Index and the Federal Fund Effective Rate.

Conclusion on the regression models:

Comparing the Adjusted R Square values of the two Regression Models, we conclude that the
Regression Model 1 is a better predictive model, as it has higher value of Adjusted R Square.

So, the accepted regression model has GDP as the dependent variable and Population,
Unemployment rate and Federal Fund Effective Rate as the independent variables.

This is the fulfilment of the objective 5 stated above.

Conclusion and Recommendations:

Data on GDP, Population, Unemployment rate, Money supply, Consumer Price Index and
Federal Fund Effective Rate were collected from the FRED Economic Data of the Federal
Bank of St. Louis from 1975 till 2022. The data analysis on Descriptive Analysis and
Predictive modeling were done. The Correlation matrix between the variables mentioned
above, showed instances of multicollinearity. Hence, two regression models were developed
and the models were compared on the basis of the value of Adjusted R Square. The model
with GDP as the dependent variable and Population, Unemployment rate and Federal Fund
Effective Rate as the independent variables was accepted to be the best model fit for the
collected data.

This is the fulfilment of the objective 6 stated above.

Appendix: The appendix below is the data collected for analysis


Year GDP POP UNRATE M1 CPI FFER
1975 1684.91 215981 8.5 281.4 9.96 5.82
1976 1873.41 218086 7.7 297 7.07 4.93
1977 2081.82 220289 7.1 319.8 6.71 5
1978 2351.61 222629 6.1 346 8.1 4.31
1979 2627.33 225106 5.9 373 10.59 4.58
1980 2857.31 227726 7.2 395.7 13.89 4.78
1981 3207.04 230009 7.6 424.7 10.79 5.46
1982 3343.79 232218 9.7 452.7 7.68 4.94
1983 3634.03 234333 9.6 502.7 3.71 3.79
1984 4037.61 236394 7.5 538.6 4.83 3.51
1985 4338.98 238506 7.2 587 4.93 3.27
1986 4579.63 240682 7 666.1 5 3.32
1987 4855.23 242843 6.2 743.5 4.31 4.83
1988 5236.44 245061 5.5 774.3 4.58 4.82
1989 5641.58 248569 5.3 809.6 4.78 6.06
1990 5963.15 250181 5.6 810.9 5.46 7.72
1991 6158.13 253530 6.9 859.1 4.94 5.3
1992 6520.33 256922 7.5 965.2 3.79 5.6
1993 6858.68 260282 6.9 1077.8 3.51 6.9
1994 7287.24 263455 6.1 1145.1 3.27 7.5
1995 7639.75 266588 5.6 1143.3 3.32 6.9
1996 8073.12 269715 5.4 1107.4 3.09 6.1
1997 8577.55 272818 4.9 1070 2.76 5.6
1998 9062.82 276154 4.5 1080.3 2.61 5.4
1999 9631.17 279328 4.2 1101.8 2.18 4.9
2000 10250.95 282398 4 1103.9 2.69 4.5
2001 10581.93 285225 4.7 1140.7 3.23 4.2
2002 10929.11 287955 5.8 1196.6 3.24 6
2003 11456.45 290626 6 1273.8 2.33 5.5
2004 12217.19 294561 5.5 1343.5 2.22 5.1
2005 13097.19 295993 5.1 1372 2.31 4.6
2006 13815.58 298818 4.6 1375.1 2.93 4.6
2007 14474.23 301696 4.6 1374.5 2.76 5.8
2008 14769.86 304543 5.8 1433.8 2.84 9.3
2009 14478.07 307240 9.3 1637.7 1.99 9.6
2010 15048.97 309839 9.6 1740.9 0.84 8.9
2011 15599.73 312295 8.9 2006.4 1.47 8.1
2012 16253.97 314725 8.1 2316.8 2.17 7.4
2013 16843.19 317099 7.4 2549.4 1.94 6.2
2014 17550.69 319600 6.2 2814 1.98 5.3
2015 18206.02 322113 5.3 3020.2 2.17 4.9
2016 18695.11 324609 4.9 3245.9 2.57 4.4
2017 19477.34 326860 4.4 3521.5 2.26 3.9
2018 20533.06 328795 3.9 3684 2.42 3.7
2019 21380.98 330513 3.7 3845.3 2.53 8.1
2020 21060.47 331787 8.1 12838.4 2.25 5.4
2021 23315.08 332351 5.4 19347.6 2.43 3.6
2022 25462.72 333595 3.6 20410.9 5.45 5.45

You might also like