You are on page 1of 9

CHAPTER 1:

INTRODUCTION:

1.1 OBJECTIVE:

Data science is an investigation of the gigantic measure of information, which utilized for removing

from crude, organized, and unstructured information that is prepared utilizing the logical strategy,

various innovations, and algorithms. In short, we will say that information science is about: Asking

the right questions and analysing the data .

o Modelling the data using various complex and efficient algorithms.

o Visualizing the data to get a far better perspective.

o Understanding the data to create better decisions and finding the ultimate result.

With the help of data science technology, convert the huge amount of raw and unstructured data into

meaningful vision.

Data science is working for automating transportation such as creating a self-driving car etc.,

Some years prior, information was less and generally accessible during an organized structure, which

may be handily put away in dominate sheets , and handled utilizing BI tools. But in this day and age,

information is getting so enormous, i.e., roughly 2.5 quintals bytes of information is producing on a

day , which prompted information blast. It's assessed according to investigates, that by 2020, 1.7 MB

of information will be made at each and every second, by one individual on earth. Each Company

expects information to figure , develop, and improve their businesses. Now, treatment of such

tremendous measure of information might be a difficult assignment for every association. So to deal

with, cycle, and investigation of this, we required some mind boggling, amazing, and proficient

calculations and innovation, which innovation appeared as information Science. Following are some

principle explanations behind utilizing information science technology: With the assistance of

information science innovation, we will change over the colossal measure of crude and unstructured

information into significant insights. Data science innovation is picking by different organizations,

regardless of whether it's a huge brand or a startup. Google, Amazon, Netflix, and so on, which handle
the huge measure of information , are utilizing information science calculations for better client

experience.

Data science is working for robotizing transportation like making a self-driving vehicle, which is the

fate of transportation. Data science can help in a few expectations like different study, elections ,flight

ticket affirmation, etc. Structure information is the ordinary understudy's database .Unstructured

information which is only your "Facebook and your Google data". Structure and an unstructured

information need to control for executing your "AI and fake motors". To discover information

science, one should have interests. when you have interest and pose different inquiries, at that point

you'll comprehend the business issue easily. It's additionally needed for an information researcher all

together that you'll locate numerous better approaches to tackle the circumstance efficiently.

Communication abilities are generally significant for an information researcher in light of the fact that

in the wake of taking care of a business issue, you might want to talk it with the group. AI is spine of

information science. To give preparing to a machine all together that it can go about as an individual's

mind. In information science, we utilize different AI calculations to settle the issues.

The art of measuring stock expenses has been an irksome task for certain researchers and agents

undoubtedly theorists are astoundingly installed in the assessment zone of stock worth assumption for

a decent and fruitful speculation numerous speculators are enthused about knowing the future

circumstance of the financial exchange in such a situation a viable expectation framework for

financial exchanges helped merchants, financial specialists, and experts by offering help of data like

the future estimation of certain stocks.

There's a heap of muddled monetary pointers and furthermore the change of the financial exchange is

incredibly violent. Be that as it might, as the development is getting advanced the opportunity to get a

reliable fortune from the protections trade is extended and it moreover makes experts find the most

infinitive markers to improve a forecast for those of you who don't get stocks. Stocks are

fundamentally a value speculation that speaks to part possession in an enterprise or an organization it

qualifies you for a piece of that organization's income and resources now the expectation of the

market esteem is critical to help in amplifying the benefit of your investment opportunity buy while
keeping the danger low and this is significant in light of the fact that you need to put your cash in a

stock which will increment in incentive after some time and not abatement.

Features of machine learning:

 Machine learning make use of data to detect various patterns in a given dataset.
 It can learn from historical data and improve automatically.
 Data-driven technology.
 Machine learning is same as data mining as it also deals with the large amount of the data
1.2 Existing System:
The motive of the existing system was to show the highest accuracy and minimal error metrics for stock
market prediction. Many algorithms are compared to them for that purpose. SVM shows high precision on
non-direct order information though LR is the favoured calculation if the accessible model is that of
relapse, as it has high certainty esteem. RF shows high exactness on the paired arrangement model and
multilayer perceptron offers minimal blunder in expectation. Subsequently, it tends to be inferred that
picking the calculation mostly relies upon the sort and volume of information on which forecasts are to be
investigated.
1.2.1 Drawbacks of existing system:
 Though the existing system dealt with accuracy and minimal error time taken to generated the
output quite large based on the algorithm used in the existing system.
 There is no feature to know that how much value do we put at risk by investing in a particular
stock.

1.3 Proposed System:


The motive of proposed system is to predict the stocks minimal time with optimise solution. We
going to include what is the right time invest on stocks. Forecast the stocks for next one week.
Risk stock value of particular company going to predict that information.

 Linear regression is that the hottest machine learning algorithm based on supervised learning.

This algorithm work on regression, which may be a method of modeling target values based

on independent variables. It represents the shape of the equation , which features a

relationship between the set of inputs and predictive output. This algorithm is generally used

in forecasting and predictions. Since it shows the linear relationship between input and output

variable, hence it's called linear regression .

 Y= mx+c

o Where, y= Variable
o X= Experimental variable

o M= Slope

o C= Intercept.

 Django:

Django Tutorial gives essential and progressed ideas of Django. Our Django Tutorial is intended
for amateurs and experts both. Django is a Web Application Framework which is utilized to
create web applications. Our Django Tutorial incorporates all subjects of Django, for example,
presentation, highlights, establishment, climate arrangement , administrator interface, treat,
structure approval, Model, Template Engine, Migration, MVT and so forth

Result of Stock forecast done through Django system. 

Process behind stock prediction:

 Getting the stock quotes from previous history datasets

 Move on to the data warehouse

 Data cleaning done on the warehouse stock datasets

 Preprocess the stock data

 Data transformation done on the after preprocessed data

 Obtaining the pattern for the preprocessed stock data i.e graphs showing variations

 Then evaluating the patterns

 Finally knownledge of outcome derived.

Advantages of proposed system:


 In our proposed system we used minimal algorithm which makes little time with optimal
results.
 We can know how much value do we put at risk by investing in a particular stock.
Chapter 2
Literature Survey

1.Title:Stock Price Prediction Using Data Analytics

Author : S. Tiwari, A. Bharadwaj and S. Gupta

Year:2017

Abstract:

Accurate financial prediction is of great interest for investors. This paper proposes use of Data
analytics to be used in assist with investors for making right financial prediction so that right decision
on investment can be taken by Investors. Two platforms are used for operation: Python and R.
various techniques like Arima, Holt winters, Neural networks (Feed forward and Multi-layer
perceptron), linear regression and time series are implemented to forecast the opening index price
performance in R. While in python Multi-layer perceptron and support vector regression are
implemented for forecasting Nifty 50 stock price and also sentiment analysis of the stock was done
using recent tweets on Twitter. Nifty 50 ( A NSEI) stock indices is considered as a data input for
methods which are implemented. 9 years of data is used. The accuracy was calculated using 2-3
years of forecast results of R and 2 months of forecast results of Python after comparing with the
actual price of the stocks. Mean squared error and other error parameters for every prediction
system were calculated and it is found that feed forward network only produces 1.81598342% error
when opening price of stock is forecasted using it.

Drawbacks:

 In this project they are using  Feed Forward Neural network various efficient methods are
available other than this
 They used linear regression algorithm if we increase data  it fails to fit complex datasets
properly.
 Accuracy level is less in this project for predicting the stock price
 Outliers of a data set are anomalies or extreme values that deviate from the other data points of
the Distribution. Data outliers can damage the performance
2.Title:Prediction of Stock Market by Principle Component Analysis

Author: P. Guo, M. Waqar, H. Dawood , M. B. Shahnawaz and M. A. Ghazanfar

Year:2017

Abstract:

The categorization of high dimensional data present a fascinating challenge to machine


learning models as frequent number of highly correlated dimensions or attributes can affect the
accuracy of classification model. In this paper, the problem of high dimensionality of stock exchange
is investigated to predict the market trends by applying the principal component analysis (PCA) with
linear regression. PCA can help to improve the predictive performance of machine learning methods
while reducing the redundancy among the data. Experiments are carried out on a high dimensional
spectral of 3 stock exchanges such as: New York Stock Exchange, London Stock Exchange and Karachi
Stock Exchange. The accuracy of linear regression classification model is compared before and after
applying PCA. The experiments show that PCA can improve the performance of machine learning in
general if and only if relative correlation among input features is investigated and careful selection is
done while choosing principal components. Root mean square error (RMSE) is used as an evaluation
metric to evaluate the classification mode.

Drawbacks:

 In this project they using PCA(principle component analysis).principle component are not as
readable and interpretable as orginal features
 RME(root mean square error) is prone to outlier as it uses the same concept mean in
computing each error value
 In linear regression algorithm have underfitting problem when a situation that arises when a
machine learning model fails to capture the data properly

3.Title:Stocks Market Prediction Using Support Vector Machine

Author :Z. Hu, J. Zhu and K. Tse

Year:2013

Abstract:
A lot of studies provide strong evidence that traditional predictive regression models face
significant challenges in out-of sample predictability tests due to model uncertainty and parameter
instability. Recent studies introduce particular strategies that overcome these problems. Support
Vector Machine (SVM) is a relatively new learning algorithm that has the desirable characteristics of
the control of the decision function, the use of the kernel method, and the sparsity of the solution.
In this paper, we present a theoretical and empirical framework to apply the Support Vector
Machines strategy to predict the stock market. Firstly, four company-specific and six macroeconomic
factors that may influence the stock trend are selected for further stock multivariate analysis.
Secondly, Support Vector Machine is used in analyzing the relationship of these factors and
predicting the stock performance. Our results suggest that SVM is a powerful predictive tool for
stock predictions in the financial market.

Drawbacks:

1. SVM algorithm is not suitable for large data sets.


2. SVM does not perform very well when the data set has more noise i.e. target classes are
overlapping.
3. In cases where the number of features for each data point exceeds the number of training
data samples, the SVM will underperform. As the support vector classifier works by putting
data points, above and below the classifying hyperplane there is no probabilistic explanation
for the classification.

4.Title:Survey of Stock Market

Prediction Using Machine Learning Approach

Author:A. Sharma, D. Bhuriya and U. Singh

Year:2017

Abstract:

Stock market is basically nonlinear in nature and the research on stock market is one of the
most important issues in recent years. People invest in stock market based on some prediction. For
predict, the stock market prices people search such methods and tools which will increase their
profits, while minimize their risks. Prediction plays a very important role in stock market business
which is very complicated and challenging process. Employing traditional methods like fundamental
and technical analysis may not ensure the reliability of the prediction. To make predictions
regression analysis is used mostly. In this paper we survey of well-known efficient regression
approach to predict the stock market price from stock market data based. In future the results of
multiple regression approach could be improved using more number of variables..
Drawbacks:

1. it is easily affected by outliers regression solution will be likely dense (because no


regularization is applied)
2. The learning curve is pretty steep

5.Title:Neural Networks through Stock

Market Data Prediction

Author: R. Verma , P.Choure and U. Singh

Year:2017

Abstract:

In the proposed work, we presented an Artificial Neural Network approach to predict the
stock market indices. We outlined the design of the Neural Network model with its salient features
and customizable parameters. A number of the activation functions are implemented along with the
options for the cross validation sets. We finally test our algorithm on the Nifty stock index dataset
where we predict the values on the basis of values from the past days. We achieve a best case
accuracy of 96% on the dataset.

Drawbacks:

 Needs enormous amount of datas mainly for architecture


 Long training times for deep networks.
 Architecture have to tuned out to achieve best performance. There are design decisions that
have to be made,from the no of layers and no of nodes in each layers to activation
functions,and an architecture that works well to some one problem very often does not
generalize well
6.Title:Stock Market Prediction Using Hybrid

Approach

Author:V. Rajput and S. Bobde

Year:2016

Abstract:

The objective of this paper is to construct a model to predict stock value movement using the
opinion mining and clustering method to predict National Stock Exchange (NSE). We have used
domain specific approach to predict the stocks from each domain we have taken some stock with
maximum capitalization. Proposed Method is Not at all like past methodologies where the general
states of mind or sentiments are considered, sentiments of the particular subjects of the
organization or sector are fused into the stock prediction model. Topics and related opinion of
shareholders are automatically extracted from the writings in a message board by utilizing our
proposed strategy alongside isolating clusters of comparable sort of stocks from others using
clustering algorithms. Proposed methodology will give us two output set i.e. one from sentiment
analysis and another from clustering based prediction with respect to some specialized parameters
of stock exchange. By examining both the results an efficient prediction is produced. In this paper
stocks with maximum capitalization within all the important sectors are taken into consideration for
empirical analysis.

Drawbacks:

 Sentimental analysis is not efficient for analyzing large amounts of data without error.
 The no of clusters often unknown in different datasets.
 No particular data point relevant
 Sentimental analysis limitations dependent on the restraints you place on degree the input
can be modified

You might also like