You are on page 1of 51

DATA ANALYTICS IN BUSINESS

Lecturer: Trần Văn Hải Triều

Analyse stock price


fluctuations according
to market and internal
business factors of VCB
Group 6
Members
Lê Nguyễn Ánh Dương - 21520753

Nguyễn Thị Thảo Ngân - 21521176

Nguyễn Thanh Tri - 21521569

Châu Thanh Bình - 21521871

Nguyễn Thị Nhật Tâm - 21520440


Contents
1. Introduction
2. Research objectives
3. Methods
4. The influence of factors
5. Analysis stock price fluctuation during
1/2019 - 11/2023 (Descriptive statistical
measures method)
6. Forecast stock price
01 INTRODUCTION
Models used
Linear Regression
ARIMA
LSTM
Compare based on 3 metrics
MAPE
RMSE
MLSE

Introduction
Stocks are a form of investment that Purpose
represents partial ownership in a company Used the 2 best performing models to
Stock valuation is the process of determining predict the next 60 days of stock closing
the true market value of a stock prices
02 RESEARCH OBJECTIVES

Analyze market factors affecting Vietcombank stock price.

Analyze internal business factors affecting Vietcombank stock price.

Analyze Vietcombank stock price movements during 2019-Q3/2023: we


use descriptive statistical measures to provide a general picture of stock
price trends.

Forecasting future Vietcombank stock prices


02 RESEARCH OBJECTIVES METHODS OF GOALS

To analyze these factors affecting VCB stock price:


Statistical hypothesis testing to evaluate the effect of factors on stock
price.

Linear correlation coefficient (Pearson correlation coefficient).

To forecast VCB’s price movement, we will utilize a time-series dataset


and employ various models, including:

Statistical Models

Deep Learning Models


02 RESEARCH OBJECTIVES TOOL

Visual Studio Code Library

Numpy
Pandas
Matplotlib
Scikit-learn
Scipy
02 RESEARCH OBJECTIVES DATA COLLECTION

Dataset 1: Collected from Investing.com


Focuse on VCB's stock market price from 01/01/2019 to 11/10/2023.
Includes information:
Date: the opening day of cryptocurrency trading
Open: the opening price of the cryptocurrency on a given trading day
High: the highest price reached by the cryptocurrency during the
trading day
Low: the lowest price reached by the cryptocurrency during the
trading day
Close: the closing price of the cryptocurrency on a given trading day
Adj Close: the adjusted closing price of the cryptocurrency on a given
trading day
Volume: the total number of cryptocurrency units traded during the
trading day
02 RESEARCH OBJECTIVES DATA COLLECTION
Dataset 2: collected from Q1/2017 to Q3/2023
Includes information:
Quarter: quarters of the year
GDP, ROA, ROE : collected quarterly from VietstockFinance
CPI, Close: using python to convert monthly to quarterly
Month: the months of the year
CPI(Monthly), Close(Monthly): collected monthly from
VietstockFinance and Yahoo Finance
03 METHODS ARIMA

ARIMA models are widely recognized and used in the field of time series forecasting,
especially in the field of economics and finance. die to their strength and effectiveness
in forecasting financial time series, especially in short-term forecasting
Advantages
Require the use of data from the time series of interest
Forecast a large number of time series
Disadvantages
Rely on a model from the past to predict the future
Have difficulty in predicting new emerging points
The future value of a variable in ARIMA is expressed as follows

In the process of building an ARIMA model, the determination of p, d, and q is very


important and is often repeated several times until a reliable model is chosen.
03 METHODS LINEAR REGRESSION

Linear Regression is a method


used to determine the relationship
between a dependent variable
and one or more independent
variables, all of which have
numerical values
Simple Linear Regression is
described through the formula:
03 METHODS T - TEST
T-test is a type of inferential statistical test used to determine if the means of two
groups are significantly different
Independent t-tests are used to compare two groups that are not related to each
other.
Dependent t-tests are used to compare two groups that are related to each other
The t-test formula is as follows:

The t-value is calculated using the t-test formula. The t-value is then compared to a
critical t-value to determine if the null hypothesis is rejected.
Advantages:
T-tests are relatively simple and easy to use.
T-tests can be used to compare two groups with different sizes.
T-tests are relatively robust to violations of the assumption of normality.
Disadvantages:
T-tests require that the data be normally distributed.
T-tests may not be accurate when the samples are small.
Applications: Science, Business, Society
03 METHODS CORRELATION
Correlation is a measure of the strength and direction of the relationship between two
variables
Types: Linear correlation and non-linear correlation
The formula for calculating the correlation coefficient is as follows:

Advantages:
Correlation is a relatively simple and easy-to-use statistical measure.
Correlation can be used to measure the strength and direction of the relationship
between two variables.
Correlation can be used to make predictions about one variable based on the value of
another variable.
Disadvantages:
Correlation does not prove causation
Correlation is sensitive to outliers
Correlation is not always accurate
Applications: Science, Business, Society
03 METHODS DESCRIPTIVE STATISTICAL MEASURE

Descriptive statistical measures are used to summarize and describe data


Common types of descriptive statistical measures: central tendency, variability,
position
Advantages:
They are relatively simple to understand and interpret.
They can be used to summarize data from a variety of sources.
They can be used to compare data from different groups or populations.
Disadvantages:
They do not provide any information about the population from which the data
was collected.
They can be misleading if the data is not normally distributed.
Applications of descriptive statistical measures: Science, Business, Society
03 METHODS LSTM
LSTM is the innovation of RNN, which is designed to avoid the long-term dependency problem

The purpose of specific components:


Forget gate: Decide what information to Update gate: Helps the model to combine
throw away from the cell state. cell state

Candidate: Goes through the tanh layer to Output gate: Decides the output of each
create a vector of new candidates that time step
could be added to the model.
04 THE INFLUENCE OF FACTORS GDP

I. Analysis market factors that influence on stock price


GDP (Gross Domestic Product)
What is GDP ?
GDP is the total monetary or market value of all the finished goods
and services produced within a country’s borders in a specific time
period.

As a broad measure of overall domestic production, it functions as a


comprehensive scorecard of a given country’s economic activity.

GDP is typically calculated on an annual basis, it is sometimes


calculated on a quarterly basis.
04 THE INFLUENCE OF FACTORS GDP

I. Analysis market factors that influence on stock price


Test the relationship and impact
T-Test
Research Question: Does GDP have a significant impact on stock

price ?

Null hypothesis (H0): There is no significant difference in stock price

between the high-GDP group and the low-GDP group.

Alternative hypothesis (Ha): There is a significant difference in stock

price between the high-GDP group and the low-GDP group.


04 THE INFLUENCE OF FACTORS GDP

I. Analysis market factors that influence on stock price


Test the relationship and impact
We test this hypothesis with alpha = 0.05:

p_value < 0.05 => we reject H0 and accept Ha => there is a significant

difference between the high-GDP group and the low-GDP group.


04 THE INFLUENCE OF FACTORS GDP

I. Analysis market factors that influence on stock price


Test the relationship and impact
Linear Correlation Coefficient
Research Question: Whether GDP has a positive or negative

relationship with stock price? Is it strong or weak?

The correlation coefficient is 0.82, which indicates a strong positive

correlation between GDP and stock price.


04 THE INFLUENCE OF FACTORS GDP

I. Analysis market factors that influence on stock price


Test the relationship and impact
Linear Correlation Coefficient

In conclusion, GDP has an effect on the stock market.

The economy is healthy and growing, businesses are more likely

to report better earnings and growth, and vice versa.


04 THE INFLUENCE OF FACTORS CPI

I. Analysis market factors that influence on stock price

What is CPI ?
CPI measures the monthly change in prices consumers pay.

CPI is one of the most popular measures of inflation and deflation.


04 THE INFLUENCE OF FACTORS CPI

I. Analysis market factors that influence on stock price


Test the relationship and impact (T-test)

Research Question: Does CPI have a significant impact on stock

price?

Null hypothesis (H0): There is no significant difference in stock price

between the high-CPI group and the low-CPI group.

Alternative hypothesis (Ha): There is a significant difference in stock

price between the high-CPI group and the low-CPI group.


04 THE INFLUENCE OF FACTORS CPI

I. Analysis market factors that influence on stock price


Test the relationship and impact (T-test)

p_value > 0.05, so we fail to reject H0 => there is no significant

difference between the high-CPI group and the low-CPI group.


04 THE INFLUENCE OF FACTORS CPI

I. Analysis market factors that influence on stock price


Test the relationship and impact (T-test)
Linear Correlation Coefficient

Research Question: Whether CPI has a positive or negative

relationship with stock price? Is it strong or weak?

The correlation coefficient is 0.19, which indicates a weak positive

correlation between ROE and stock price.


04 THE INFLUENCE OF FACTORS CPI

I. Analysis market factors that influence on stock price


Test the relationship and impact (T-test)
Linear Correlation Coefficient

In short, looking at the chart, we can still see the impact of CPI on

stock prices. However, it is not always clearly shown. There are cases

where CPI and closing price both increase or decrease.


04 THE INFLUENCE OF FACTORS ROA
II. Analysis internal business factors that influence on stock price
What is ROA ?
ROA is a ratio that represents a company's profitability relative to its
total assets.

Used to determine how effectively a company uses its assets to


generate profits.

A higher ROA means a company is more efficient and productive at


managing its balance sheet to generate profits.
04 THE INFLUENCE OF FACTORS ROA
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)

Research Question: Does ROA have a significant impact on stock

price?

Null hypothesis (H0): There is no significant difference in stock price

between the high-ROA group and the low-ROA group.

Alternative hypothesis (Ha): There is a significant difference in stock

price between the high-ROA group and the low-ROA group.


04 THE INFLUENCE OF FACTORS ROA
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)
We test this hypothesis with alpha = 0.05:

p_value < 0.05, so we reject H0 and accept Ha => there is a significant

difference between the high-ROA group and the low-ROA group.


04 THE INFLUENCE OF FACTORS ROA
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)
Linear Correlation Coefficient
Research Question: Whether ROA has a positive or negative

relationship with stock price ? Is it strong or weak?

The correlation coefficient is 0.57, which indicates a strong positive

correlation between ROA and stock price.


04 THE INFLUENCE OF FACTORS ROA
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)
Linear Correlation Coefficient

Overall, we can conclude that ROA is also an important factor affecting

stock prices.
04 THE INFLUENCE OF FACTORS ROE
II. Analysis internal business factors that influence on stock price
What is ROE ?
(ROE) is a measure of financial performance calculated by dividing
net income by shareholders' equity and is also referred to as return
on net assets.

ROE is considered a gauge of a corporation's profitability and how


efficient it is in generating profits.

The higher the ROE, the more efficient a company's management


is at generating income and growth from its equity financing.
04 THE INFLUENCE OF FACTORS ROE
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)

Research Question: Does ROE have a significant impact on stock

price ?

Null hypothesis (H0): There is no significant difference in stock price

between the high-ROE group and the low-ROE group.

Alternative hypothesis (Ha): There is a significant difference in stock

price between the high-ROE group and the low-ROE group.


04 THE INFLUENCE OF FACTORS ROE
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)
We test this hypothesis with alpha = 0.05:

p_value > 0.05, so we fail to reject H0 => there is no significant

difference between the high-ROE group and the low-ROE group.


04 THE INFLUENCE OF FACTORS ROE
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)
Linear Correlation Coefficient
Research Question: Whether ROE has a positive or negative

relationship with stock price? Is it strong or weak?

The correlation coefficient is 0.11, which indicates a weak positive

correlation between ROE and stock price.


04 THE INFLUENCE OF FACTORS ROE
II. Analysis internal business factors that influence on stock price
Test the relationship and impact (T-test)
Linear Correlation Coefficient

It can be seen that the closing price tends to increase when the return

on equity (ROE) increases.

ROE increases => VCB is operating effectively => profits increase in the

future => creates trust for investors => stock price increases.
05 ANALYSIS STOCK PRICE FLUCTUATION 1/2019 - 11/2023

Mean: 73101.33991769547 Quartile 1 (25%): 64045.0


Mode: 63738 Quartile 2 (50%): 75096.0
Median: 75096.0 Quartile 3 (75%): 81950.0
St deviation: Variance: 180836626.92801812
13447.550963949463 Skewness: -0.1679746230519
Range: 65339 Kurtosis: -0.3905616206844
05 ANALYSIS STOCK PRICE FLUCTUATION 1/2019 - 11/2023
Looking at the values of the above objects:
Stock prices are concentrated at around 75096.0
The average price of the above data set is about 73101.34
The skewness of the data distribution is about -0.168
From the above result:
The Median value is greater than the Mean value and Skewness has
a negative value, which means the data distribution is skewed to
the left.
The majority of stock values in this data set tend to be greater than
the average value.
Kurtosis is negative => the data is somewhat flat with a wide
degree of dispersion.
05 ANALYSIS STOCK PRICE FLUCTUATION 1/2019 - 11/2023

The Median value is greater than the Mean value and Skewness has
a negative value, which means the data distribution is skewed to
the left. At the same time, it also shows that the majority of stock
values in this data set tend to be greater than the average value.
Kurtosis is negative, which indicates the data is somewhat flat with
a wide degree of dispersion.
In general, it can be seen that VCB's stock price has tended to
increase from 2019 to present.
06 FORECAST STOCK PRICE ARIMA
Ratio 6:2:2

The model has low MAPE, RMSE, and MSLE values, and it is
able to capture the overall trend of the VCB close price.

The model is also able to capture the autocorrelation in the


VCB close price data.
06 FORECAST STOCK PRICE ARIMA
Ratio 6:2:2
CONCLUSION:
All three metrics (MAPE, RMSE, and MSLE) have significantly
higher values for the test set compared to the validation set.
MAPE and RMSE showing increases of around 50% and 84%.
06 FORECAST STOCK PRICE ARIMA
Ratio 7:1:2

The model has low MAPE, RMSE, and MSLE values, and it is
able to capture the overall trend of the VCB close price.

The model is also able to capture the autocorrelation in the


VCB close price data.
06 FORECAST STOCK PRICE ARIMA
Ratio 7:1:2
CONCLUSION
06 FORECAST STOCK PRICE LINEAR REGRESSION

Ratio 6:2:2

The model has moderate MAPE, RMSE, and MSLE values.

However, it is important to note that these metrics are specific


to the validation set, and the model's performance on unseen
data may vary.
06 FORECAST STOCK PRICE LINEAR REGRESSION

Ratio 6:2:2
CONCLUSION
This indicates that the model is performing better on unseen
data than it did on the validation set.
This is a positive sign, as it suggests that the model has
generalized well and is able to make accurate forecasts on
new data points.
06 FORECAST STOCK PRICE LINEAR REGRESSION

Ratio 7:1:2

The model has moderate MAPE, RMSE, and MSLE values.

However, it is important to note that these metrics are specific


to the validation set, and the model's performance on unseen
data may vary.
06 FORECAST STOCK PRICE LINEAR REGRESSION

Ratio 7:1:2
CONCLUSION
This indicates that the model is performing better on unseen
data than it did on the validation set.
This is a positive sign, as it suggests that the model has
generalized well and is able to make accurate forecasts on
new data points.
06 FORECAST STOCK PRICE LSTM

Ratio 6:2:2

These results suggest that the LSTM model is a good choice for
forecasting the VCB close price.
The model is able to make accurate predictions on the
validation data, which suggests that it is likely to generalize well
to new data.
06 FORECAST STOCK PRICE LSTM

Ratio 6:2:2
CONCLUSION
This is not uncommon, as the test dataset is typically more
challenging than the validation dataset.
However, the difference in performance is relatively small,
which suggests that the model is still able to generalize well
to new data.
06 FORECAST STOCK PRICE LSTM

Ratio 7:1:2

The LSTM model appears to be performing well on the


validation dataset.
The model is able to make accurate predictions on the
validation data, which suggests that it is likely to generalize well
to new data.
06 FORECAST STOCK PRICE LSTM

Ratio 7:1:2
CONCLUSION
The model's performance is significantly better on the test
dataset than it is on the validation dataset.
This is a very positive result, as it suggests that the model is
generalizing well to new data.
WE WANT TO SAY

THANK YOU
F O R Y O U R A T T E N T I O N

You might also like