Professional Documents
Culture Documents
of
by
1
DECLARATION
I certify that,
a) The work contained in this report has been done by me under the guidance of my
supervisor.
b) The work has not been submitted to any other institute for a degree or diploma.
c) I have conformed to the norms and guidelines given in the Ethical Code of Conduct of the
Institute.
d) Whenever I have used materials (data, theoretical analysis, figures, and text) from
other sources, I have given due credit to them by citing them in the text of the thesis
and providing their details in the references.
2
Contents
3
Last Semester Work:
Twitter’s Influence on Bitcoin Price Fluctuation
People's perceptions of cryptocurrencies have shifted, and cryptocurrencies
continue toademonstrate their viability as aan aalternative currency. aMoney
investedaand suppositiona are two aof its amost appealing topics afor apeoplea
looking to increase their aincome.a Similarly, the cryptocurrency amarket
sharesasome characteristicsawith the share market, foreign exchange industry
(forex), or other asset markets such asacrudeaoil,agold,aand various valuablea
metals.aManyafactors, including theavolume ofabuyersaand sellers, as well as
other political and economic news and events, can influence the price of various
coins. It is critical for shareholders and hedge funds to have instruments that
predict the rise and fall of cryptocurrency prices and advise them on which
currency toainvestain.aIt is beneficialato use socialamediaaand cryptocurrency
trendsaand whether athere's a high correlationabetween a people'sapostsaand
changes in coinaprices.
Therefore,athe aresearch ainvestigated athese aaspects:
● Is athere aa relation in both aTwitter asentiment aand Bitcoin afluctuation?
● Can aa machine alearning technique predicated on apolarity asentimenta
accuracyabe used to predict Bitcoin price movement?
Results:
4
Built sequential LSTM Model to understand predictive capabilities of Twitter
sentiment and deep learning model to get the prediction of crypto sentiment and
tweet sentiment using lag of 7 days.
● Precision tells that out of all the target that the model predicted would be
true, 56% were actually true .
● Recall tells that out of all the target that were actually true, the model
only predicted this outcome correctly for 47% of target.
● F1 score of 51% tells that the model has done ok job of predicting
whether the tweet sentiment and crypto sentiment both will be positive or
both will be negative.
Theaunderlyingahypothesisaofathisaworkaisathataopinionsaexpressedainasocial
media can function as useful predictors of such fluctuations, especially in sofar
as they incorporate features such as sentiment and opinion. Thisastudyashows
thatatheacorrelationabetweenabitcoinapriceaandasentimentaisalow. It’s also to be
noted that even though the correlation is low.It’saalsoatoabe notedathat
evenathoughatheacorrelationais lowait’sanot completelyarandom.aIt
improvesawhenaa lagaisaintroduced. Hence,aTwitter doesaprovide a slight
indication to theabitcoin prices.
The model gave the good accuracy of 73.4% in predicting the bitcoin price
signal depending on the crypto sentiment.
5
Machine Learning Approach to Study
Return Dependencies Across Industries
1. Introduction
The lack of previous research on this topic could be attributed to the statistical
difficulties associated with estimating regression models with a large quantity of
predictors. The theoretical model in Hong et al. (2007), which introduces
information frictions into an economy with multiple linked industries, motivates
our use of lagged industry returns to forecast individual industry returns. Cash
flow impacts arising in one sector can actually impact anticipated revenues in
related industries because of sector links. Investors in a frictionless rational
expectations equilibrium identify all of the inter-industry consequences of a
cash flow shock in a specific industry. As a result, equity prices across all
relevant industries adjust immediately to fully incorporate the inter-industry
consequences of the cash flow shock, and lagged industry returns have no
predictive power. Investors with limited amount of information processing
capabilities, on the other hand, specialise in specific market segments. When a
cash flow shock occurs in a specific industry in this environment,
information-processing limitations prevent investors specialised in similar
industries from quickly calculating the full consequences of the shock. As a
result of the gradual spread of information across industries, the resulting slow
6
adjustment in asset prices gives rise to industry return predictability based on
lagged industry returns.
The least absolute shrinkage and selection operator (LASSO), a strong and
frequently used machine learning tool, is used. LASSO, like ridge regression,
causes shrinkage in estimated coefficients through the addition of a convex
regularization term in the objective function for fitting a model. In compared to
the penalty term in ridge regression, the LASSO employs a regularization term,
allowing shrinkage to zero for some coefficients. As a result, it performs feature
extraction, which typically results in a sparse model. Sparsity has two important
benefits. To begin, setting insignificant coefficients to zero helps prevent
overfitting the data. Besides that, it makes it easier to interpret the estimated
model by selecting the most significant predictor variables.
Even though the LASSO's penalty term reduces overfitting through sparsity, it
also tends to overshrink the coefficients for the chosen variables. This
possibility can result in significant (in magnitude) downward biases in the
estimated coefficients. To reduce biases, recent studies propose OLS
post-LASSO estimation. The idea is to first use LASSO to lessen model
dimension; then, to reduce biases in LASSO coefficient estimates, the
coefficients for the chosen predictors are re-estimated using OLS. We estimate
predictive regression models for each industry using OLS post-LASSO, with the
set of candidate predictors including the lagged returns for all 30 industries
considered. OLS post-LASSO estimation helps us determine the most important
7
set of lagged industry returns for predicting the return of a given industry while
also producing more precise predictions of the coefficients for the relevant
lagged industry returns.
We examine the ability of lagged industry returns to predict individual industry
returns using both in-sample and out-of-sample tests.
To perform the in-sample analysis, we use monthly return data from Kenneth
French's Data Library to estimate predictive regression models via OLS
post-LASSO for 30 industry portfolios spanning 1960 to 2022. For 29 of the
individual industries, the LASSO chooses at least one lagged industry return as
a predictor, while several lagged industry returns are selected for 22 of the
individual industries. Moreover, the OLS post-LASSO estimation results show
that the LASSO-selected lagged industry returns are frequently statistically
significant predictors of industry returns.
8
2. Predictive Regression Framework
where
is the is the ith industryaportfolioareturnainaexcessaofathearisk-freearateaat
time t; is theacoefficientaassociatedato the jth laggedaindustryaportfolioa
return; N = 30.
9
LASSO-selected predictors. Furthermore, Belloni et al. (2017) contend that
penalised regression methods introduce a dissipation bias, which can be
corrected by applying OLS to predictors chosen in the first stage that uses a
variable selection method.
3. In-Sample Analysis
10
Tablea1:aSummaryastatistics,aindustryaportfolioaexcessareturns,
a1959:12-2022:11a
11
Mines 7.35 25.84 -34.54 34.98 0.28
The table reports summary statistics for excess returns for 30 value-weighted industry
portfolios from Kenneth French's Data Library. Excess returns are computed relative to the
one-month Treasury bill return. The industry abbreviations are as follows:Food = Food
Products;Beer = Beer and Liquor; Smoke = Tobacco Products;Games = Recreation; Books =
Printing and Publishing; Hshld = Consumer Goods; Clths = Apparel; Hlth = Healthcare,
Medical Equipment, and Pharmaceutical Products; Chems = Chemicals; Txtls = Textiles;
Cnstr = Construction and Construction Materials; Steel = Steel Works, Etc.; FabPr =
Fabricated Products and Machinery; ElcEq = Electrical Equipment; Autos = Automobilesand
Trucks; Carry = Aircraft, Ships, and Railroad Equipment; Mines = Precious Metals,
Non-Metallic, and Industrial Metal Mining; Coal = Coal; Oil = Petroleum and Natural
Gas; Util = Utilities; Telcm = Communication; Servs = Personal and Business Services;
BusEq = Business Equipment; Paper = Business Supplies and Shipping Containers;
Trans = Transportation; Whlsl = Wholesale; Rtail = Retail; Meals = Restaurants, Hotels, and
Motels; Fin = Banking, Insurance, Real Estate, and Trading; Other = Everything Else.
Table 2 shows the estimated OLS post-LASSO coefficients for each industry.
12
After taking account for the lagged predictors, the estimation sample available
ranges from 1960:01 to 2022:12. The true regression coefficients for the
LASSO-selected sub-model are our goal. We use a bold (italicised bold) entry to
imply that a coefficient estimate is significant at the 10% (5%) level using the
conventional OLS post-LASSO t-statistic.
Tablea2:aOLSapost-LASSOapredictivearegressionaestimation
aresults,a1960:01-2022:12
regressor food beer smoke games books hshld clths hlth chems txtl
food 0.12
beer
smoke
games 0.03
books 0.18 0.04 0.06 0.1
hshld
clths 0.04 0.05 0.05 0.1 0.07 0.08 0.09
htlh
chems 0.15
txtl 0.06
cnstr
steel -0.08
fabpr
elceq -0.27
autos 0.11
13
carry 0.17 0.05
mines -0.02 -0.06
coal -0.06 -0.06 -0.03 -0.07 -0.04 -0.05 -0.05 -0.05 -0.07
oil -0.1 -0.17 -0.15 -0.15
utils 0.09 0.27 0.13 0.16 0.11
telcm -0.11 -0.14
servs -0.15 0.05 0.12
buseq 0.06 0.13
paper -0.19
trans 0.1
whlsl
rtail 0.02 0.03 0.05 0.06 0.07
meals
fin 0.11 0.1 0.08 0.08 0.18
other
r^2 2.24 2.52 6.54 5.05 6.3 2.97 7.93 2.68 0.78 7.91
Regressor cnstr steel fabpr elceq autos carry mines coal oil util
food 0.1
beer -0.27 -0.08 -0.1
smoke -0.09 0.02
games
books 0.13
hshld -0.13 -0.08
clths 0.04 0.04
htlh -0.13 -0.08
chems
txtl
cnstr -0.18
steel
fabpr 0.12
elceq
autos -0.002
carry 0.17 0.08
mines -0.04
coal -0.06 -0.05 0.08
14
oil -0.14 -0.13 -0.2 -0.08
utils 0.15 0.17 0.09
telcm 0.07
servs
buseq 0.12 0.03
paper 0.19
trans 0.06 0.06 0.16
whlsl -0.14
rtail 0.001 0.18 0.1
meals
fin 0.15 0.15 0.09 0.1 0.14 0.13
other 0.1
R^2 5.13 1.29 1.56 0.8 6.13 2.27 2.84 2.52 7.88
Regressor telcm servs buseq paper trans whlsl rtail meals fin other
food -0.1
beer -0.06 -0.05
smoke -0.03 -0.09 -0.14 -0.05 -0.06
games
books 0.09 0.1 0.12 0.14 0.06
hshld -0.07
clths 0.06 0.1 0.08
htlh -0.06
chems
txtl
cnstr -0.07
steel -0.08 -0.12
fabpr
elceq
autos -0.04
carry -0.04 0.06 0.04
mines -0.01
coal -0.02 -0.04 -0.04 -0.05
oil -0.09 -0.12 -0.11 -0.15 -0.15
utils 0.16 0.12 0.18 0.25 0.12
telcm -0.12
15
servs 0.02 0.05 0.07
buseq 0.05 0.03 0.06
paper
trans
whlsl
rtail 0.1 0.03 0.13
meals -0.1 0.05
fin 0.16 0.16 0.1 0.12 0.11 0.05 0.13 0.1
other
R^2 5.18 2.88 2.75 3.24 1.29 7.46 1.61 7.91 1.7 2.69
16
the individual industries in Table 2. Based on the conventional OLS
post-LASSO t -statistics, eleven and thirteen (seven and nine) of the coefficient
estimates for lagged coal and oil returns, respectively, are important at the 10%
(5%) level. Table 2 shows that the estimated coefficients for lagged coal and oil
returns are all negative, with the possible exception of the autoregression
coefficient for coal. These negative relationships are assumedly the result of
supply shocks that raise product prices and returns for sectors in earlier stages of
production but squeeze profit margins and lower returns for sectors in later
stages of production. The magnitude of the significant coefficient estimates is
once again substantial.
17