You are on page 1of 5

RESEARCH PROPOSAL

Research Vietnamese Stock Market Forecasting Based on Text Mining Technology


title and Deep Learning

Authors 1. Van Tran Hoai Thuong


2. Hoang Thao Nhien
3. Nguyen Phu Quang
4. Vo Manh Hai

Supervisors Nguyen Luong Vuong

Abstract

The present study introduces an innovative methodology aimed at forecasting the Vietnamese
stock market through the utilization of text-mining technology and deep learning techniques.
With the escalating availability of textual data associated with financial news and market
sentiment, text mining has emerged as a potent instrument for distilling valuable insights from
unstructured data repositories. Within this study, we employ text mining methodologies to
aggregate and scrutinize an extensive corpus of online financial news articles about the
Vietnamese stock market. Moreover, we integrate deep learning models such as Long Short-
Term Memory (LSTM) and Recurrent Neural Networks (RNN), renowned for their adeptness in
capturing intricate patterns and interdependencies within time-series data, to construct robust
forecasting models. By amalgamating textual features extracted via text mining with the
forecasts derived from time-series data, our objective is to augment the precision and
dependability of stock market forecasting. This investigation contributes to the expanding body
of scholarship concerning financial market forecasting and underscores the promise of text
mining and deep learning methodologies in the realm of stock market analysis.

Keywords: Financial market forecasting, Text mining, LSTM, RNN, time-series

1. Related work

In the realm of financial market forecasting, the integration of text mining techniques has
garnered significant attention. Numerous studies have highlighted the efficacy of text mining in
capturing trends and signals from textual data, consequently enhancing financial market
forecasting accuracy. For instance, research conducted by Xie et al. (2019) showcased how text
mining methodologies aided in extracting valuable insights from financial news articles, thereby
improving forecast precision.

Moreover, the application of deep learning, including LSTM and RNN architectures, has
emerged as a prominent approach in market forecasting. Studies such as that by Siami-Namini et

1
al. (2019) demonstrated the effectiveness of LSTM in capturing complex patterns within
financial time-series data, leading to more accurate predictions of stock prices and market trends.
However, it is acknowledged that the combination of deep learning with text mining can further
augment forecasting capabilities, as outlined by Lei et al. (2021).

Sentiment analysis has also played a pivotal role in market forecasting by gauging market
sentiment and sentiment-based signals from textual data. Research by Yang et al. (2024)
illustrated how sentiment analysis techniques influenced market forecasting outcomes,
underscoring its importance in evaluating market sentiment and performance.
Although prior investigations regarding financial market forecasting in Vietnam might be scarce,
recent studies, exemplified by Tran et al. (2024), offer valuable insights into the utilization of
traditional forecasting models within the Vietnamese landscape. Tran et al. delves into the
application of conventional methodologies such as LSTM alongside indicators like SMA and
MACD. Additionally, Duong et al. employ text-mining techniques to anticipate the trend of the
VN30 index. However, despite these advancements, there persists a notable gap in the
exploration of amalgamating these conventional methods with sophisticated techniques like text
mining and deep learning to bolster forecasting precision.
2. Research objectives

Utilizing text mining technology to gather and analyze a vast amount of online financial news
articles related to the Vietnamese stock market.

Integrating deep learning models, such as LSTM and RNN, to capture complex patterns and
relationships within time-series data.

Combining textual features extracted through text mining with predictions derived from time-
series data to enhance the accuracy and reliability of stock market forecasting.

Tailoring the forecasting methodology to the specific nuances of the Vietnamese stock market to
provide relevant and actionable insights for stakeholders.

Contributing to the advancement of knowledge in financial market forecasting by demonstrating


the efficacy of text mining technology and deep learning techniques in the Vietnamese context.
3. Methods
1) Data Collection:
When deciding on a news source, the underlying importance and the source field are both
considered to ensure the effectiveness and universality of the text data. News related to the stock
market is divided into 8 parts: financial capital news; civil economic news; industrial economic
news; company news; international economic news; emerging market news; consumer news; and
Economic News. This study selects sources that cover Vietnam's main online financial news
sectors.
The VN30 index is a crucial stock market indicator in Vietnam, comprising the 30 largest and
most liquid companies listed on both the Ho Chi Minh City Stock Exchange (HoSE) and the
Hanoi Stock Exchange (HNX). These companies are typically influential players in the market
2
and represent key sectors of the Vietnamese economy. Tracking the VN30 index provides
investors and analysts with insights into the overall performance and trends of the Vietnamese
stock market. It serves as a primary measure for evaluating market health and performance over
specific periods.
2) Text Mining:

Data collection: Use automatic data collection tools or APIs of news websites to get data from
articles (retrieve 8 articles per day on 8 topics). Tools like BeautifulSoup and Selenium in
Python.
Data preprocessing: Collected data often contains a lot of unnecessary noise. Process data by
removing special characters, converting text to lowercase, and removing stop words (common
and meaningless words).
Keyword analysis: Use the TF-IDF (Term Frequency-Inverse Document Frequency) method to
identify important keywords in articles.
Clustering: Uses K-means clustering algorithms to group articles into similar groups based on
their content. This helps you identify the main themes being discussed in the data.
Sentiment Analysis: Apply sentiment analysis to measure positive, negative, or neutral opinions
in articles (TextBlob).
3) Deep Learning:
Use deep learning models like LSTM and RNN to analyze time series data to predict prices and
capture complex patterns in stock market trends.
5) Evaluation:

3
Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE): These metrics quantify the
difference between predicted and actual stock prices, providing insight into the accuracy of the
forecasting models.
Precision and Recall: Evaluating the performance of sentiment analysis by measuring the
precision and recall of positive, negative, and neutral sentiment classifications.
Clustering Cohesion and Separation: Assessing the quality of article clustering by measuring the
cohesion within clusters and the separation between clusters.
Profitability Analysis: Conducting backtesting to assess the profitability of trading strategies
based on the forecasted stock market trends.
6) Limitations and Future Directions:
Data Availability: The effectiveness of the forecasting models heavily relies on the availability
and quality of textual data. Future research could explore strategies for enhancing data collection
and preprocessing techniques.
Model Complexity: Deep learning models like LSTM and RNN are computationally intensive
and may require significant resources for training and inference. Investigating methods to
streamline model architecture and improve efficiency is a potential area for future research.
Ensemble Techniques: Combining multiple forecasting models, including traditional statistical
models, text mining approaches, and deep learning models, through ensemble techniques could
further improve forecasting accuracy.
4. Expected results

The study predicts that the integration of text mining technology and deep learning techniques
will bring more accurate and reliable forecasts to the Vietnamese stock market. By leveraging
text features extracted from financial articles and capturing complex patterns in time series data,
we expect our method to outperform forecasting methods. traditional. Furthermore, we anticipate
that our research will contribute to advancing knowledge on financial market forecasting by
demonstrating the effectiveness of text mining and deep learning methods in the Vietnamese
context.

5. References

https://onlinelibrary.wiley.com/doi/abs/10.1002/for.2794

https://arxiv.org/abs/1911.09512

https://arxiv.org/abs/1909.12789

https://www.sciencedirect.com/science/article/abs/pii/S0306261923014666

https://www.nature.com/articles/s41599-024-02807-x

https://dl.acm.org/doi/abs/10.1145/2857546.2857619

4
5

You might also like