Professional Documents
Culture Documents
S COLLEGE OF ENGINEERING
(An Autonomous College under VTU, Belagavi)
Bull Temple Road, Bangalore - 560 019
A Project Report-2022-23
On
MACHINE LEARNING
offered by
Submitted By
USN
NAME: 1. Ayush Kumar Sinha 1BM20EC209
2. Umang Singh 1BM20EC208
3. Subhash S 1BM20EC212
4. Anurag Soni 1BM20ET011
1 Introduction 1
2 Problem definition 2
3 Proposed solution 2
4 Literature survey 3
5 Methodology 4
6 Implementation 5
7 Result analysis 6
8 Conclusion 6
9 Code 7
10 References 11
3
INTRODUCTION
Percentage delivery in stocks refers to the proportion of shares that are physically
transferred from the seller to the buyer during a stock market transaction. It is an
important measure that provides insights into the liquidity and trading activity of
a particular stock.
Percentage delivery helps in assessing the transparency and efficiency of the stock
market. Higher percentage delivery indicates a healthier market, where there is a
greater degree of physical delivery of shares. This implies that trading is taking
place based on actual ownership of shares, rather than speculative or manipulative
activities.
There is a need to efficiently predict the %delivery of shares to know the condition
of market and future trends in advance. It will also help in predicting the
performance of a share in specific conditions.
We have proposed using Machine Learning to make this prediction. We have
gathered the sales reports of a few shares as our dataset and we make our
predictions accordingly.
1|Page
PROBLEM DEFINITION
The objective of this project is to develop a machine learning model that can
accurately predict the percentage delivery of shares for a given stock based on
historical and real-time market data. The model aims to assist investors, traders,
and market analysts in understanding the liquidity, trading patterns, and potential
market manipulation of stocks.
PROPOSED SOLUTION
To tackle this problem we propose a Machine Learning model which takes the
previous year’s data of different companies as its dataset and applies different
regression methods to find the most accurate result. This is decided by comparing
each model’s R2 score. Our model makes a training set and a test set to find the
validity of predictions.
2|Page
LITERATURE SURVEY
1. In the paper,” A Novel Bayesian Additive Regression Trees Ensemble Model
Based on Linear Regression and Nonlinear Regression for Torrential Rain
Forecasting”, Jiansheng Wu has discussed how three different linear
regression model are used to extract the linear characteristic of rainfall
system with the Partial Squares Least Regression, the Quantile Regression
and the M-regression. (IEEE -2010)
2. “Prediction of Packet Delivery Ratio Using Lasso Regression in
Comparison with Linear Regression Algorithm for Multi Input Multi Output
Network”, In this paper V. Venu Gopal Reddy discusses the study is to
predict the accurate Packet Delivery Ratio (PDR) using the dataset provided
with the help of the machine learning technique Novel Linear and compared
with Lasso regression algorithms. (IEEE-2022)
3. “Improvement of Random Forest Cascade Regression Algorithm and Its
Application in Fatigue Detection”, Tao Qunzhu in this paper proposes a
method based on improved random forest cascading regression to detect the
face feature points. By dividing the facial feature points into regions and
performing shape regression on each region separately, the human face
shape is finally obtained. (IEEE-2019)
4. In this paper,” Rank Prediction in Graphs with Locally Weighted
Polynomial Regression and EM of Polynomial Mixture Models”, Michalis
Rallis describes a learning framework enabling ranking predictions for
graph nodes based solely on individual local historical data. The two
learning algorithms capitalize on the multi feature vectors of nodes in graphs
that evolve in time. In the first case we use weighted polynomial regression
(LWPR) while in the second we consider the Expectation Maximization
(EM) algorithm to fit a mixture of polynomial regression models.
(IEEE-2011)
3|Page
METHODOLOGY
4|Page
IMPLEMENTATION
We have used Jupyter Notebook to write the code for our project using python
v3.
POLYNOMIAL REGRESSION
Polynomial Regression is a regression algorithm that models the relationship
between a dependent(y) and independent variable(x) as nth degree polynomial.
The Polynomial Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+........bnx1n
MULTIPLE REGRESSION
In Multiple Linear Regression, the target variable(Y) is a linear combination of
multiple predictor variables x1, x2, x3,..... ,xn. Since it is an enhancement of
Simple Linear Regression, so the same is applied for the multiple linear
regression equation, the equation becomes:
y= b0+b1x1+ b2x2+ b2x3+........bnxN
R2 SCORE
Coefficient of determination also called as R2 score is used to evaluate the
performance of a linear regression model. It is the amount of the variation in the
output dependent attribute which is predictable from the input independent
variable(s).
RMSE
Root Mean Square Error (RMSE) is a standard way to measure the error of a
model in predicting quantitative data. Formally it is defined as follows:
5|Page
RESULT ANALYSIS
We find that all 3 models perform differently for the particular dataset. The
model using polynomial regression gives an R2 score of 0.63412 which is not
very accurate and has an RMSE of 0.10201
CONCLUSION
6|Page
CODE
IMPORTING LIBRARIES
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
IMPORTING DATASET
dataset = pd.read_csv(r'C:\Users\Medha\OneDrive\Desktop\ML
AAT\SHARES_DATASET.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
dataset.describe()
7
TRAINING MULTIPLE REGRESSION MODEL
from sklearn.linear_model import LinearRegression
regressor_m=LinearRegression()
regressor_m.fit(X_train,y_train)
TRAINING RANDOM FOREST MODEL
from sklearn.ensemble import RandomForestRegressor
regressor_r = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor_r.fit(X_train, y_train)
TRAINING POLYNOMIAL REGRESSION MODEL
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree =2 )
X_poly = poly_reg.fit_transform(X_train)
regressor_P = LinearRegression()
regressor_P.fit(X_poly, y_train)
POLYNOMIAL REGRESSION
y_pred_P = regressor_P.predict(poly_reg.transform(X_test))
np.set_printoptions(precision=3)
print(np.concatenate((y_pred_P.reshape(len(y_pred_P),1), y_test.reshape(len(y_test),1)),1))
from sklearn.metrics import r2_score
r2_P=r2_score(y_test,y_pred_P)
print("R2 Score is",r2_P)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred_P)
rmse = math.sqrt(mse)
print("The difference between actual and predicted values", rmse)
RANDOM FOREST REGRESSION
y_pred_r=regressor_r.predict(X_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred_r.reshape(len(y_pred_r),1),y_test.reshape(len(y_pred_r),1)),1))
from sklearn.metrics import r2_score
8
r2_RF=r2_score(y_test,y_pred_r)
print("R2 Score is",r2_RF)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred_r)
rmse = math.sqrt(mse)
print("The difference between actual and predicted values", rmse)
MULTIPLE REGRESSION
y_pred_m=regressor_m.predict(X_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred_m.reshape(len(y_pred_m),1),y_test.reshape(len(y_pred_m),1)),
1))
from sklearn.metrics import r2_score
r2_M=r2_score(y_test,y_pred_m)
print("R2 Score is",r2_M)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred_m)
rmse = math.sqrt(mse)
print("The difference between actual and predicted values", rmse)
PREDICTION
if r2_M>r2_RF and r2_M>r2_P:
print("% Deliverable is",regressor_m.predict([[0, 1, 0, 0, 0, 1541, 1557, 1564.65, 1470,
1474.95, 1476.5, 1499.5, 982492, 577454]]))
elif r2_RF>r2_M and r2_RF>r2_P:
print("% Deliverable is",regressor_r.predict([[0, 1, 0, 0, 0, 1541, 1557, 1564.65, 1470,
1474.95, 1476.5, 1499.5, 982492, 577454]]))
else:
print("% Deliverable is",regressor_P.predict([[0, 1, 0, 0, 0, 1541, 1557, 1564.65, 1470,
1474.95, 1476.5, 1499.5, 982492, 577454]]))
PLOTTING
plt.scatter(X[:10, 8], y[:10], color = 'pink',s = 50,edgecolor ="green",marker ="s")
plt.scatter(X[:10, 9], y[:10], color = 'red',s = 50,edgecolor ="red",marker ="^")
#plt.plot(X, regressor_r.predict(X), color = 'blue')
9
plt.title('chart')
plt.xlabel('X axis')
plt.ylabel('%deliverable')
plt.legend(['volume','deliverable'])
plt.show()
10
REFRENCES
1. J. Wu, L. Huang and X. Pan, "A Novel Bayesian Additive Regression Trees
Ensemble Model Based on Linear Regression and Nonlinear
Regression for Torrential Rain Forecasting," 2010 Third International Joint
Conference on Computational Science and Optimization, Huangshan, China,
2010, pp. 466-470, doi: 10.1109/CSO.2010.15.
6. Y. Gong and P. Zhang, "Predictive Analysis and Research Of Python Usage Rate Based
on Polynomial Regression Model," 2021 3rd International Conference on Artificial
Intelligence and Advanced Manufacture (AIAM), Manchester, United Kingdom, 2021, p
p. 266-270, doi: 10.1109/AIAM54119.2021.00061.
11
12