You are on page 1of 16

lOMoARcPSD|9227122

RMIT

ECON 1030 Business Statistic

Assessment 2 Semester 1

Statistical Analysis

Students:
Yuh Yih Celine Leong s3722343
Shin Yi Tay s3701946
Anjie Zhang s3615077

Tutor:
Nithya Tharmaseelan

Tutorial time:
11:30 to 12:30 Thursday 080.08.009

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

STATISTICAL MODELLING AND ANALYSIS OF


DISTRIBUTED LEDGER TECHNOLOGY’S
INVESTMENT

Executive Summary

Distributed Ledger Technology (DLT) has captured the imagination of entrepreneurs and
policy makers. According to some, blockchain technology has the potential to revolutionise
transactions protecting the integrity of ownership. One of the leading company in blockchain
technology is DigitalX. This report is to analyse recent share price performance of DigitalX
with general descriptions, Also, it evaluates the constructed histogram based on the weekly
returns to explain the shape and detect the outliers. Additionally, this report explains the linear
regression model by examining the goodness of fit, coefficients and residual analysis as well
as providing two factors that will be able to improve the regression model. Finally, the report
also applies SWOT analysis and the summary of statistical facts to give a recommendation to
the investment in DigitalX.

Introduction

In the advanced world of today, Distributed Ledger Technology (DLT) is a digital data that can
be recorded, shared and synchronized transactions in their respective electronic ledgers.
Blockchain is one type of distributed ledger which can fundamentally change the traditional
ways in the world of finance to make it more efficient, resilient and reliable. DigitalX is a
blockchain-based software company where other traders will be able to buy or sell
cryptocurrency. In this report, there are two variables which consist of DLT weekly returns
(response variable) and ASX Index weekly return (explanatory variable). Moreover, this report
will investigate the relationship between the DLT weekly return and ASX Index weekly return.
Based on the data analysis reported, future investors will be able to decide to invest in DigitalX
due to our supporting reasons.

-1-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Part A

(1)
DLT weekly closing price
0.4
0.35
0.3
0.25
Price

0.2
DLT
0.15 Closing
0.1 Price

0.05
0
2/28/15
4/30/15
6/30/15
8/31/15

2/29/16
4/30/16
6/30/16
8/31/16

2/28/17
4/30/17
6/30/17
8/31/17

2/28/18
12/28/14

10/31/15
12/31/15

10/31/16
12/31/16

10/31/17
12/31/17
General description
The line graph illustrated the data for the weekly closing price for the Distributed Ledger
Technology (DLT) for the dates between 28 December 2014 and 8 March 2018. Over the 4
years’ period, the starting value of the closing price has increased from $0.145, on 4 January
2015 to the end of the period on 4 March 2018, with a closing price of $0.24.

Notwithstanding, there is a volatility which can be shown in the graph between the period.
DLT’s weekly closing price has been gradually rising except for a surge of closing price from
May 2017 and January 2018. From the graph shown, at the end of year 2014, it shows a slight
increase of $0.12 from the initial $0.14 (from 27 December 2014) to approximately $0.26 in
June 2015. The data then remained at a constant trend, with minimum decrease in variation
between the dates of 15 July 2015 to 31 January 2016. In addition, according to the graph, the
period between December 2014 and July 2016 exhibit that the closing price had a fluctuation
of $0.15. Furthermore, it is interesting to know that on 21 May 2017, the company has recorded
its lowest weekly closing price of $0.022. The steep slope of the graph shows the rapid increase
where it has reached its peak of 0.38 of their weekly closing price in 07 January 2018. It is
important as it only took half a year to approximately climb from the bottom to the peak. Finally,
after closing price reached to the peak, it declined significantly until February 2018.

Overall, the changing trends in the graph outlines the fluctuation in-between each time-period
and that there was an increase in DLT’s weekly closing price from 28 December 2014 to 8
March 2018.

-2-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

(2)
DLT weekly return
70

60

50
Frequnecy

40

30

20

10

Weeky return

The histogram which represents the data of DLT’s weekly returns appears to be a positively-
skewed (right-skewed) distribution, where more data lies on the left-hand side of the histogram.
The right-side of the histogram is more spread out, hence it is not a normal distribution.
Furthermore, after calculation, four outliers are identified. The four outliers are 65.22, 66.67,
80.95 and 95.45. Their Z values are 3.55, 3.63, 4.43 and 5.24 respectively. Based on the rule
of outliers, if Z-value is less than -3 or more than 3, it will be considered as outliers. Therefore,
these numbers are considered as outliers.

Calculation of four outliers:

4 Outliers: 95.45 (Weekly return) 5.24 (Z value)


65.22 (Weekly return) 3.55 (Z value)
66.67 (Weekly return) 3.63 (Z value)
80.95 (Weekly return) 4.43 (Z value)

Z value (95.45) = (Weekly Return-Mean)/Standard deviation = (95.45-1.66)/17.9 = 5.24


Since 3<5.24, then 95.45 is outlier.

Z value (65.22) = (Weekly Return-Mean)/Standard deviation = (65.22-1.66)/17.9 = 3.55


Since 3<3.55, then 65.22 is outlier.

Z value (66.67) = (Weekly Return-Mean)/Standard deviation = (66.67-1.66)/17.9 = 3.63


Since 3<3.63, then 66.67 is outlier.

Z value (80.95) = (Weekly Return-Mean)/Standard deviation = (80.95-1.66)/17.9 = 4.43


Since 3<4.43, then 80.95 is outlier.

-3-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

(3)
Location (Arithmetic mean, median and mode)
Arithmetic mean (1.66) Data analysis of DLT weekly return
The average number of DLT weekly returns is $1.66.
Mean 1.66
Median (0) Standard Error 1.39
The median number of DLT weekly return is 0. Median 0
Therefore, 50% of weekly return have 0 or below and Mode 0
50% of weekly return have 0 or above. Standard Deviation 17.90
Sample Variance 320.35
Mode (0) Kurtosis 7.33
Because 0 appears 14 times, more than any other Skewness 1.95
value. Therefore, the mode is 0. It can be interpreted Range 131.88
that the most common weekly returns of DLT is 0. Minimum -36.43
Maximum 95.45
Shape (Skewness)
Sum 277.85
Skewness (1.95)
Count 167.00
Because skewness is 1.95, which means that it is
more than zero. Also, because median is less than First quartile -8.07
mean. Therefore, data of DLT weekly return is Third quartile 7.19
positive or right skewed. Interquartile 15.26

Spread (Range, Interquartile Range, Sample Variation and Sample Standard Deviation)

Range (131.88)
Range, the difference between the minimum (-36.43) and maximum number (95.45) of DLT
weekly return, is 131.88.

Interquartile (15.26)
Interquartile, the difference between the first quartile (-8.07) and the third quartile (7.19) of
DLT weekly return, is 15.26

Sample variance (𝑆 2 = 320.35)


Sample variance is the measured of variation based on square deviations from the mean. The
variance for the DLT weekly returns is approximately 320.35.

Standard deviation (𝑆 = 17.9)


Standard deviation is the measured of variation based on square deviation from the mean,
which directly related to the variance. The standard deviation foe DLT weekly returns is
approximately 17.9.

(4)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑜𝑠𝑠𝑒𝑠
Empirical probability of a loss = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑒𝑘𝑙𝑦 𝑟𝑒𝑡𝑢𝑟𝑛𝑠

75
= 167

≈ 0.4491
Therefore, approximately 44.9% of DLT weekly return are less than zero which means that
they are making a loss.

-4-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

(5)

ASX Index
7000

6000

5000

4000
Price

3000
ASX Index
2000

1000

0
2/28/15
4/30/15
6/30/15
8/31/15

2/29/16
4/30/16
6/30/16
8/31/16

2/28/17
4/30/17
6/30/17
8/31/17

2/28/18
12/28/14

10/31/15
12/31/15

10/31/16
12/31/16

10/31/17
12/31/17
General description
The line graph displays the weekly closing price for ASX Index between the dates of 28
December 2014 and 28 February 2018.

It can be shown that ASX Index remained at a stable pace across the four years which the data
is based on. The volatility of the data also appears to be relatively low during this time frame.
It is evidently presented that the price of ASX Index only fluctuated in-between 5000 and 6000
from 28 Dec 2014 to 28 February 2018.It is pivotal to know that it has been slightly increased
between the period from $5435.9 to $5933.4. In addition, the highest price of ASX Index was
recorded at $6122.30 in 31 December 2017, whereas the lowest price of ASX index was
$4765.3 in 2 February 2016. The difference between the two indexes is $1357, although it is a
significant amount, in terms of the change of price, but when it is viewed from the graph, it
seems to be justifiable due the time frame. It took them approximately two years to get to their
index to the highest value, this represents a growth that is smooth and gradual.

In conclusion, the stability of ASX Index over four years is relatively high. ASX Index has also
increased during the time period this data is based on.

-5-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

ASX Index weekly return


40

35

30

25
Frequency

20

15

10

Weekly return

Descriptions:
Based on histogram, it is a negative-skewed (left-skewed) distribution, where more data lies
on the right-hand side of the histogram and that the left-side of the histogram appears to be
more spread out. Therefore, this is not a normal distribution. In addition, there is one outlier in
weekly return of ASX Index, which is -5.76. The Z value of -5.76 is -3.17. Based on the rule
of outliers, if Z values is less than -3 or more than 3, it will be considered as outliers. Therefore,
-5.76 is considered as outliers.

Calculation of outlier:

Z value (-5.76) = (Weekly Return-Mean)/Standard deviation = (-5.76-0.07)/1.84 = -3.17


Since -3.17 < 3, then -5.76 is an outlier.

-6-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Location (Arithmetic mean, median and mode)


Data analysis of ASX weekly return
Arithmetic mean (0.07)
The average number of ASX Index weekly return is 0.07. Mean 0.069340613
Standard Error 0.142516752
Median (0.07) Median 0.068056146
The median number of DLT weekly return is 0. Therefore, Mode #N/A
50% of weekly return have 0 or below and 50% of weekly Standard Deviation 1.841722319
return have 0 or above
Sample Variance 3.391941099
Mode (N/A) Kurtosis 0.697745051
None of the ASX Index have the same price Skewness -0.161031135
Range 10.26819213
Shape (Skewness) Minimum -5.761062381
Maximum 4.507129751
Skewness (-0.16)
Sum 11.57988235
Because skewness is -0.16, which means that it is less than
zero. In addition, because median is slightly less than mean Count 167
(0.069>0.068). Therefore, data of ASX weekly return is First quartile -0.993257808
negative or left skewed. Third quartile 1.165977962
Interquartile 2.15923577
Spread (Range, interquartile, sample variance and
standard deviation)

Range (10.27)
Rage, the difference between the minimum (-5.76) and maximum number (4.51) of DLT
weekly return, is 10.27.

Interquartile (2.16)
Interquartile, the difference between the first quartile (-0.99) and the third quartile (1.17) of
DLT weekly return, is 2.16

Sample variance (S2=3.39)


Sample variance is the measured of variation based on square deviations from the mean. The
variance for ASX returns is approximately 3.39

Standard deviation (S=1.84)


Sample variance is the measured of variation based on square deviations from the mean, which
directly related to the variance. The standard deviation for ASX returns is approximately 1.84
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑜𝑠𝑠𝑒𝑠
Empirical probability of a loss = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑒𝑘𝑙𝑦 𝑟𝑒𝑡𝑢𝑟𝑛𝑠

81
= 167

≈ 0.4850

Therefore, approximately 48.5% of DLT weekly return are less than zero which means that
they are making a loss.

-7-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Part B

(1)

Goodness of Fit
R Square
𝑆𝑆𝑅 0.0138
𝑅 𝑠𝑞𝑢𝑎𝑟𝑒 = = ≈ 2.58638 ∗ 10−7 = 0.000000258638 ≈ 0
𝑆𝑆𝑇 53177.4526

Interpretation:
The coefficient of determination (R≈0) indicates that approximately 0% of the variation in
returns of DLT closing price is explained by the variation in the ASX 200 Index returns.
This suggests a poor fit.

Adjusted R Squared
2 𝑛−1
𝑅𝑎𝑑𝑗 = 1 − [(1 − 𝑅2 ) ] = −0.006060346
𝑛−𝑘−1

Interpretation:
-0.60603% in the variation in returns of DLT closing price has been explained by the regression
of ASX Index returns on DLT closing price returns. The remaining 99.39397% variation in
DLT closing price returns left unexplained. This suggest a poor fit.

-8-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Lieaner Regression
120

100

80
DLT Weekly returns

60

40 Y = 0.0049X + 1.6634
R² = 2.6E-07 Weekly return
20
Linear (Weekly return)
0
-8 -6 -4 -2 0 2 4 6
-20

-40

-60

ASX weekly returns

Coefficients
Prediction line
̂𝑖 = 1.663 + 0.005𝑋𝑖
𝑌
Predicted return of DLT closing price = 1.663 + 0.005*(returns of ASX 200 Index)

Intercept (1.663)
If the value of independent variable (ASX Index) is zero, then the estimated dependent variable
(DLT closing price) is on average would be 1.663.

Slope coefficient (0.005)


For every increase in one dollar in ASX 200 Index, DLT closing price is estimated to increase
on average by 0.005.

-9-

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Inference about the slope: T test (95% confidence interval)

H0 : β1 = 0 (No linear relationship)


H1 : β1 ≠ 0 (linear relationship does exist)

95% confidence interval


𝛼 = 0.05, Two tails distribution, therefore, each tail is 0.025

𝑡0.025,167−2 critical value = 1.96

Decision rule
Do not reject H0 , If critical value t lies on between -1.96 and 1.96

Test Statistic:
𝑏1 − 𝛽1 0.005 − 0
𝑡= = = 0.007
𝑆𝑏1 0.757

Conclusion
Because 0.007 is in-between -1.96 and 1.96, therefore, do not reject H0 , which means that
there is no linear relationship between dependent variable DLT’s weekly returns and
independent variable ASX’s weekly returns.

Inference about the slope: P-value approach (95% confidence interval)

H0 : β1 = 0 (No linear relationship)


H1 : β1 ≠ 0 (linear relationship does exist)

Significant: 𝛼 = 0.05

Decision rule:
Do not reject H0 , if P-value is more than 𝛼.

P-value=0.995

Conclusion:
Because P-value (0.995>0.05), therefore, do not reject H0 , which means that there is no linear
relationship between dependent variable DLT’s weekly returns and independent variable
ASX’s weekly returns.

Confidence Interval Estimate for the slope (95% confidence interval)


𝑏1 ± 𝑡𝑛−2 (Lower 95%: -1.48884, upper 95%: 1.49873)

At 95% level of confident, the confident interval for the slop is (-1.48884, 1.49873); we are
95% confident that for every increase in one dollar in ASX 200 Index, DLT closing price is
estimated to increase on average between -1.49 and 1.50. Because the 95% confidents interval
does include 0.

Conclusion:
There is no significant relationship between dependent variable DLT’s weekly returns and
independent variable ASX’s weekly returns.

- 10 -

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Residual Plot
120

100

80

60
Residuals

40

20

0
-8 -6 -4 -2 0 2 4 6
-20

-40

-60

Formula of residual:
̂𝑖
𝑒𝑖 = 𝑌𝑖 − 𝑌

Residual Analysis
Residual analysis is used to evaluate the assumptions and thus determine the regression model
selected is an appropriated model. There are four assumptions of regression (Known as LINE),
which are linearity, independence, normality and equal variance.

Linearity
Linearity states that whether the relationship between variables is linear or not. To assess
linearity, the residuals are plotted against the independent variable which is ASX weekly
returns. There is no apparent pattern or relationship between residual and independent variable.
The residuals appear to be evenly spread above and below 0 for the different value of
independent variable. In short, the linear model is appropriate for the DLT weekly returns.

Independence
Independence of errors requires that the errors are independent of one another. Based on the
residual graph above, it shows that it is independent because the errors for a specific time period
are often correlated with previous time period.

- 11 -

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Normality
Normality requires that the errors are normally distributed at each value of X. Based on the
normal probability plot, it showed that it is not a straight line. Therefore, this is not a normal
distribution.

Normal Probability Plot


150

100
Percent

50

0
0 20 40 60 80 100 120
-50
Sample Percentile

Equal Variance
Equal variance requires that the variance of the errors is constant for all values of X. The
variability of this residual plots is constant. Therefore, this is a constant variance.

- 12 -

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

(2)
There are various ways for the regression model to be improved, it is mainly used to enhance
its accuracy. The two factors that can be used to improve the regression model are adding
another independent variable and using the exploratory analysis method.

One factor that will improve the regression model could be adding another independent
variable. By adding another independent variable, the adjusted R Square value will increase.
Having an increased value of adjusted R Square is important as it is being used to determine
the variability of the response data around the mean. Generally, the higher adjusted R Square
value, the better the model fit the data. An example of another independent variable that the
regression model can adopt is, taking into account the company’s market share. By adding this
independent variable, it will lead us to the multiple regression equation that shows the change
in Y (DLT’s weekly return) when there is a unit change in the respective independent (X)
variable (ASX weekly index and DLT’s market share). It is important as it estimates the
association between the specific independent variable and the outcome, while the other variable
remain constant. The R Square value will explain the percentage of the variation in Y is
dependent by the independent variables. The information of the company’s market share can
be determined by dividing the company’s total sales by the industry’s total sales over a fiscal
period. It can be sourced from the regression model which is shown in part b, question one,
where it outlines the regression statistical values from the summary output. The influence is
most likely to be positive as it will strengthen the fit of the model, which shows a measure of
the closeness of the data to the fitted regression line (goodness of fit).

Another factor could be using the exploratory analysis method. It is a data analysis approach,
finding information in the data provided to generate ideas and gain insights of different aspects
about the data. It helps to better understand the relationship between the dependent (Y or DLT’s
weekly return) variable and the other independent (X or ASX weekly index) variable (that can
be more than 1). This factor is important as it encourages and supports the strength of their
relationship to determine if the model is suitable. The required information can also be sourced
from the summary output table in part B, question one. The presence of outliers or extreme
values in the data brings a significant impact to the regression model, so it is crucial that they
are treated. It will help control the variation of the predicted estimated values due to the outliers.
This is likely to improve the regression model as it will keep the model more regulated for the
future predicted estimate values. The influence is most likely to be positive as it would have
analysed the pattern of the regression model, which gives an idea whether it should be
improved or not.

- 13 -

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Part C
Ministerial brief
Purpose
The purpose of this ministerial brief is to investigate the benefits of Distributed Ledger
Technology (DLT) by applying SWOT analysis and statistical facts.

Analytical overview (SWOT)


DLT’s strengths includes how it prioritise its transparency and speed during its transaction
process and it is inexpensive as it is through use of blockchain technology. A weakness
involves is the lack of technological maturity. Further research and development needs to be
explored to fix its initial defect and targets to solve a specific problem. The opportunities
encompass the social aspect, where more people are willing to accept and adopt the blockchain
technology. The technological impact is that there is a high possibility of blockchain creating
a new form of programmable money. Moreover, the threats incorporate the political view point,
where regulatory status is uncertain as modern currencies are regulated by their government
and lead to a hurdle generated by pre-existing financial institution which blockchain
technology will face if it remains unsettled. The technological aspect is salient as the security
remains sceptical since data circulates and can be easily leaked if the encryption code is present.

Summary of facts
Based on statistical analysis, DigitalX’s weekly returns has no linear relationship with ASX
movement, which means that dependent variable, DigitalX, cannot be explained by the change
in independent variable, ASX. In addition, the empirical probability of making losses in DLT’s
weekly return is relatively high.

Conclusion
Overall, because of the concerns about this infant block chain technology and the data analysis,
it is not a sensible decision to invest in DigitalX.

Conclusion

In conclusion, analysis of this company incorporates use of DLT closing price graph to observe
the overall trend and use of histogram based on weekly returns to find out skewed distribution
as well as outliers. Moreover, descriptive analysis is applied to explain location, shape and
spread of the data as well as relatively high empirical probability of losses. In addition, ASX
returns is used to develop a linear regression model with DLT weekly returns for detecting that
the dependent variable (weekly returns of DigitalX) and independent variable (weekly returns
of ASX) have no linear relationship between each other. Also, factors which are adding
independent variables and using exploratory analysis are provided to improve the regression
model. Finally, SWOT analysis shows that there are many concerns about this infant
blockchain technology. Therefore, individuals should not invest in DigitalX though analysing
various aspect of this company.

- 14 -

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)


lOMoARcPSD|9227122

Appendices

References

Berenson, Jayne, Levine, O’Brien, Szabat & Watson, Basic Business Statistic, 4th edn, Pearson,
Australia.

Hanson RT, Reeson A, Staples M 2017, Distributed Ledger: Scenarios for the Australian
economy over the coming decades, viewed 24 May 2018,
<https://publications.csiro.au/rpr/download?pid=csiro:EP175257&dsid=DS1>.

Williams, S 2017, “5 Big Advantages of Blockchain, and 1 Reason to be very worried”, The
Motley Fool, viewed 24 May 2018, <https://www.fool.com/investing/2017/12/11/5-big-
advantages-of-blockchain-and-1-reason-to-be.aspx>.

- 15 -

Downloaded by Chen Nguy?n (khaihung1210@gmail.com)

You might also like