Professional Documents
Culture Documents
Final Statistic Report
Final Statistic Report
RMIT
Assessment 2 Semester 1
Statistical Analysis
Students:
Yuh Yih Celine Leong s3722343
Shin Yi Tay s3701946
Anjie Zhang s3615077
Tutor:
Nithya Tharmaseelan
Tutorial time:
11:30 to 12:30 Thursday 080.08.009
Executive Summary
Distributed Ledger Technology (DLT) has captured the imagination of entrepreneurs and
policy makers. According to some, blockchain technology has the potential to revolutionise
transactions protecting the integrity of ownership. One of the leading company in blockchain
technology is DigitalX. This report is to analyse recent share price performance of DigitalX
with general descriptions, Also, it evaluates the constructed histogram based on the weekly
returns to explain the shape and detect the outliers. Additionally, this report explains the linear
regression model by examining the goodness of fit, coefficients and residual analysis as well
as providing two factors that will be able to improve the regression model. Finally, the report
also applies SWOT analysis and the summary of statistical facts to give a recommendation to
the investment in DigitalX.
Introduction
In the advanced world of today, Distributed Ledger Technology (DLT) is a digital data that can
be recorded, shared and synchronized transactions in their respective electronic ledgers.
Blockchain is one type of distributed ledger which can fundamentally change the traditional
ways in the world of finance to make it more efficient, resilient and reliable. DigitalX is a
blockchain-based software company where other traders will be able to buy or sell
cryptocurrency. In this report, there are two variables which consist of DLT weekly returns
(response variable) and ASX Index weekly return (explanatory variable). Moreover, this report
will investigate the relationship between the DLT weekly return and ASX Index weekly return.
Based on the data analysis reported, future investors will be able to decide to invest in DigitalX
due to our supporting reasons.
-1-
Part A
(1)
DLT weekly closing price
0.4
0.35
0.3
0.25
Price
0.2
DLT
0.15 Closing
0.1 Price
0.05
0
2/28/15
4/30/15
6/30/15
8/31/15
2/29/16
4/30/16
6/30/16
8/31/16
2/28/17
4/30/17
6/30/17
8/31/17
2/28/18
12/28/14
10/31/15
12/31/15
10/31/16
12/31/16
10/31/17
12/31/17
General description
The line graph illustrated the data for the weekly closing price for the Distributed Ledger
Technology (DLT) for the dates between 28 December 2014 and 8 March 2018. Over the 4
years’ period, the starting value of the closing price has increased from $0.145, on 4 January
2015 to the end of the period on 4 March 2018, with a closing price of $0.24.
Notwithstanding, there is a volatility which can be shown in the graph between the period.
DLT’s weekly closing price has been gradually rising except for a surge of closing price from
May 2017 and January 2018. From the graph shown, at the end of year 2014, it shows a slight
increase of $0.12 from the initial $0.14 (from 27 December 2014) to approximately $0.26 in
June 2015. The data then remained at a constant trend, with minimum decrease in variation
between the dates of 15 July 2015 to 31 January 2016. In addition, according to the graph, the
period between December 2014 and July 2016 exhibit that the closing price had a fluctuation
of $0.15. Furthermore, it is interesting to know that on 21 May 2017, the company has recorded
its lowest weekly closing price of $0.022. The steep slope of the graph shows the rapid increase
where it has reached its peak of 0.38 of their weekly closing price in 07 January 2018. It is
important as it only took half a year to approximately climb from the bottom to the peak. Finally,
after closing price reached to the peak, it declined significantly until February 2018.
Overall, the changing trends in the graph outlines the fluctuation in-between each time-period
and that there was an increase in DLT’s weekly closing price from 28 December 2014 to 8
March 2018.
-2-
(2)
DLT weekly return
70
60
50
Frequnecy
40
30
20
10
Weeky return
The histogram which represents the data of DLT’s weekly returns appears to be a positively-
skewed (right-skewed) distribution, where more data lies on the left-hand side of the histogram.
The right-side of the histogram is more spread out, hence it is not a normal distribution.
Furthermore, after calculation, four outliers are identified. The four outliers are 65.22, 66.67,
80.95 and 95.45. Their Z values are 3.55, 3.63, 4.43 and 5.24 respectively. Based on the rule
of outliers, if Z-value is less than -3 or more than 3, it will be considered as outliers. Therefore,
these numbers are considered as outliers.
-3-
(3)
Location (Arithmetic mean, median and mode)
Arithmetic mean (1.66) Data analysis of DLT weekly return
The average number of DLT weekly returns is $1.66.
Mean 1.66
Median (0) Standard Error 1.39
The median number of DLT weekly return is 0. Median 0
Therefore, 50% of weekly return have 0 or below and Mode 0
50% of weekly return have 0 or above. Standard Deviation 17.90
Sample Variance 320.35
Mode (0) Kurtosis 7.33
Because 0 appears 14 times, more than any other Skewness 1.95
value. Therefore, the mode is 0. It can be interpreted Range 131.88
that the most common weekly returns of DLT is 0. Minimum -36.43
Maximum 95.45
Shape (Skewness)
Sum 277.85
Skewness (1.95)
Count 167.00
Because skewness is 1.95, which means that it is
more than zero. Also, because median is less than First quartile -8.07
mean. Therefore, data of DLT weekly return is Third quartile 7.19
positive or right skewed. Interquartile 15.26
Spread (Range, Interquartile Range, Sample Variation and Sample Standard Deviation)
Range (131.88)
Range, the difference between the minimum (-36.43) and maximum number (95.45) of DLT
weekly return, is 131.88.
Interquartile (15.26)
Interquartile, the difference between the first quartile (-8.07) and the third quartile (7.19) of
DLT weekly return, is 15.26
(4)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑜𝑠𝑠𝑒𝑠
Empirical probability of a loss = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑒𝑘𝑙𝑦 𝑟𝑒𝑡𝑢𝑟𝑛𝑠
75
= 167
≈ 0.4491
Therefore, approximately 44.9% of DLT weekly return are less than zero which means that
they are making a loss.
-4-
(5)
ASX Index
7000
6000
5000
4000
Price
3000
ASX Index
2000
1000
0
2/28/15
4/30/15
6/30/15
8/31/15
2/29/16
4/30/16
6/30/16
8/31/16
2/28/17
4/30/17
6/30/17
8/31/17
2/28/18
12/28/14
10/31/15
12/31/15
10/31/16
12/31/16
10/31/17
12/31/17
General description
The line graph displays the weekly closing price for ASX Index between the dates of 28
December 2014 and 28 February 2018.
It can be shown that ASX Index remained at a stable pace across the four years which the data
is based on. The volatility of the data also appears to be relatively low during this time frame.
It is evidently presented that the price of ASX Index only fluctuated in-between 5000 and 6000
from 28 Dec 2014 to 28 February 2018.It is pivotal to know that it has been slightly increased
between the period from $5435.9 to $5933.4. In addition, the highest price of ASX Index was
recorded at $6122.30 in 31 December 2017, whereas the lowest price of ASX index was
$4765.3 in 2 February 2016. The difference between the two indexes is $1357, although it is a
significant amount, in terms of the change of price, but when it is viewed from the graph, it
seems to be justifiable due the time frame. It took them approximately two years to get to their
index to the highest value, this represents a growth that is smooth and gradual.
In conclusion, the stability of ASX Index over four years is relatively high. ASX Index has also
increased during the time period this data is based on.
-5-
35
30
25
Frequency
20
15
10
Weekly return
Descriptions:
Based on histogram, it is a negative-skewed (left-skewed) distribution, where more data lies
on the right-hand side of the histogram and that the left-side of the histogram appears to be
more spread out. Therefore, this is not a normal distribution. In addition, there is one outlier in
weekly return of ASX Index, which is -5.76. The Z value of -5.76 is -3.17. Based on the rule
of outliers, if Z values is less than -3 or more than 3, it will be considered as outliers. Therefore,
-5.76 is considered as outliers.
Calculation of outlier:
-6-
Range (10.27)
Rage, the difference between the minimum (-5.76) and maximum number (4.51) of DLT
weekly return, is 10.27.
Interquartile (2.16)
Interquartile, the difference between the first quartile (-0.99) and the third quartile (1.17) of
DLT weekly return, is 2.16
81
= 167
≈ 0.4850
Therefore, approximately 48.5% of DLT weekly return are less than zero which means that
they are making a loss.
-7-
Part B
(1)
Goodness of Fit
R Square
𝑆𝑆𝑅 0.0138
𝑅 𝑠𝑞𝑢𝑎𝑟𝑒 = = ≈ 2.58638 ∗ 10−7 = 0.000000258638 ≈ 0
𝑆𝑆𝑇 53177.4526
Interpretation:
The coefficient of determination (R≈0) indicates that approximately 0% of the variation in
returns of DLT closing price is explained by the variation in the ASX 200 Index returns.
This suggests a poor fit.
Adjusted R Squared
2 𝑛−1
𝑅𝑎𝑑𝑗 = 1 − [(1 − 𝑅2 ) ] = −0.006060346
𝑛−𝑘−1
Interpretation:
-0.60603% in the variation in returns of DLT closing price has been explained by the regression
of ASX Index returns on DLT closing price returns. The remaining 99.39397% variation in
DLT closing price returns left unexplained. This suggest a poor fit.
-8-
Lieaner Regression
120
100
80
DLT Weekly returns
60
40 Y = 0.0049X + 1.6634
R² = 2.6E-07 Weekly return
20
Linear (Weekly return)
0
-8 -6 -4 -2 0 2 4 6
-20
-40
-60
Coefficients
Prediction line
̂𝑖 = 1.663 + 0.005𝑋𝑖
𝑌
Predicted return of DLT closing price = 1.663 + 0.005*(returns of ASX 200 Index)
Intercept (1.663)
If the value of independent variable (ASX Index) is zero, then the estimated dependent variable
(DLT closing price) is on average would be 1.663.
-9-
Decision rule
Do not reject H0 , If critical value t lies on between -1.96 and 1.96
Test Statistic:
𝑏1 − 𝛽1 0.005 − 0
𝑡= = = 0.007
𝑆𝑏1 0.757
Conclusion
Because 0.007 is in-between -1.96 and 1.96, therefore, do not reject H0 , which means that
there is no linear relationship between dependent variable DLT’s weekly returns and
independent variable ASX’s weekly returns.
Significant: 𝛼 = 0.05
Decision rule:
Do not reject H0 , if P-value is more than 𝛼.
P-value=0.995
Conclusion:
Because P-value (0.995>0.05), therefore, do not reject H0 , which means that there is no linear
relationship between dependent variable DLT’s weekly returns and independent variable
ASX’s weekly returns.
At 95% level of confident, the confident interval for the slop is (-1.48884, 1.49873); we are
95% confident that for every increase in one dollar in ASX 200 Index, DLT closing price is
estimated to increase on average between -1.49 and 1.50. Because the 95% confidents interval
does include 0.
Conclusion:
There is no significant relationship between dependent variable DLT’s weekly returns and
independent variable ASX’s weekly returns.
- 10 -
Residual Plot
120
100
80
60
Residuals
40
20
0
-8 -6 -4 -2 0 2 4 6
-20
-40
-60
Formula of residual:
̂𝑖
𝑒𝑖 = 𝑌𝑖 − 𝑌
Residual Analysis
Residual analysis is used to evaluate the assumptions and thus determine the regression model
selected is an appropriated model. There are four assumptions of regression (Known as LINE),
which are linearity, independence, normality and equal variance.
Linearity
Linearity states that whether the relationship between variables is linear or not. To assess
linearity, the residuals are plotted against the independent variable which is ASX weekly
returns. There is no apparent pattern or relationship between residual and independent variable.
The residuals appear to be evenly spread above and below 0 for the different value of
independent variable. In short, the linear model is appropriate for the DLT weekly returns.
Independence
Independence of errors requires that the errors are independent of one another. Based on the
residual graph above, it shows that it is independent because the errors for a specific time period
are often correlated with previous time period.
- 11 -
Normality
Normality requires that the errors are normally distributed at each value of X. Based on the
normal probability plot, it showed that it is not a straight line. Therefore, this is not a normal
distribution.
100
Percent
50
0
0 20 40 60 80 100 120
-50
Sample Percentile
Equal Variance
Equal variance requires that the variance of the errors is constant for all values of X. The
variability of this residual plots is constant. Therefore, this is a constant variance.
- 12 -
(2)
There are various ways for the regression model to be improved, it is mainly used to enhance
its accuracy. The two factors that can be used to improve the regression model are adding
another independent variable and using the exploratory analysis method.
One factor that will improve the regression model could be adding another independent
variable. By adding another independent variable, the adjusted R Square value will increase.
Having an increased value of adjusted R Square is important as it is being used to determine
the variability of the response data around the mean. Generally, the higher adjusted R Square
value, the better the model fit the data. An example of another independent variable that the
regression model can adopt is, taking into account the company’s market share. By adding this
independent variable, it will lead us to the multiple regression equation that shows the change
in Y (DLT’s weekly return) when there is a unit change in the respective independent (X)
variable (ASX weekly index and DLT’s market share). It is important as it estimates the
association between the specific independent variable and the outcome, while the other variable
remain constant. The R Square value will explain the percentage of the variation in Y is
dependent by the independent variables. The information of the company’s market share can
be determined by dividing the company’s total sales by the industry’s total sales over a fiscal
period. It can be sourced from the regression model which is shown in part b, question one,
where it outlines the regression statistical values from the summary output. The influence is
most likely to be positive as it will strengthen the fit of the model, which shows a measure of
the closeness of the data to the fitted regression line (goodness of fit).
Another factor could be using the exploratory analysis method. It is a data analysis approach,
finding information in the data provided to generate ideas and gain insights of different aspects
about the data. It helps to better understand the relationship between the dependent (Y or DLT’s
weekly return) variable and the other independent (X or ASX weekly index) variable (that can
be more than 1). This factor is important as it encourages and supports the strength of their
relationship to determine if the model is suitable. The required information can also be sourced
from the summary output table in part B, question one. The presence of outliers or extreme
values in the data brings a significant impact to the regression model, so it is crucial that they
are treated. It will help control the variation of the predicted estimated values due to the outliers.
This is likely to improve the regression model as it will keep the model more regulated for the
future predicted estimate values. The influence is most likely to be positive as it would have
analysed the pattern of the regression model, which gives an idea whether it should be
improved or not.
- 13 -
Part C
Ministerial brief
Purpose
The purpose of this ministerial brief is to investigate the benefits of Distributed Ledger
Technology (DLT) by applying SWOT analysis and statistical facts.
Summary of facts
Based on statistical analysis, DigitalX’s weekly returns has no linear relationship with ASX
movement, which means that dependent variable, DigitalX, cannot be explained by the change
in independent variable, ASX. In addition, the empirical probability of making losses in DLT’s
weekly return is relatively high.
Conclusion
Overall, because of the concerns about this infant block chain technology and the data analysis,
it is not a sensible decision to invest in DigitalX.
Conclusion
In conclusion, analysis of this company incorporates use of DLT closing price graph to observe
the overall trend and use of histogram based on weekly returns to find out skewed distribution
as well as outliers. Moreover, descriptive analysis is applied to explain location, shape and
spread of the data as well as relatively high empirical probability of losses. In addition, ASX
returns is used to develop a linear regression model with DLT weekly returns for detecting that
the dependent variable (weekly returns of DigitalX) and independent variable (weekly returns
of ASX) have no linear relationship between each other. Also, factors which are adding
independent variables and using exploratory analysis are provided to improve the regression
model. Finally, SWOT analysis shows that there are many concerns about this infant
blockchain technology. Therefore, individuals should not invest in DigitalX though analysing
various aspect of this company.
- 14 -
Appendices
References
Berenson, Jayne, Levine, O’Brien, Szabat & Watson, Basic Business Statistic, 4th edn, Pearson,
Australia.
Hanson RT, Reeson A, Staples M 2017, Distributed Ledger: Scenarios for the Australian
economy over the coming decades, viewed 24 May 2018,
<https://publications.csiro.au/rpr/download?pid=csiro:EP175257&dsid=DS1>.
Williams, S 2017, “5 Big Advantages of Blockchain, and 1 Reason to be very worried”, The
Motley Fool, viewed 24 May 2018, <https://www.fool.com/investing/2017/12/11/5-big-
advantages-of-blockchain-and-1-reason-to-be.aspx>.
- 15 -