You are on page 1of 91

A Project Report on

Big mart sales prediction using


machine learning
Submitted in partial fulfilment of the requirements for the award of the degree
of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
BY
ABDULLAH (160318733088)
MOHAMMED YOUSUF ULLAH KHAN (160318733086)
MOHAMMED MANSOOR (160318733083)
Under the guidance
of
Dr. C. Atheeq
Assosiate Professor
Department Of CSE, DCET

Department of Computer Science Engineering


Deccan College of Engineering and Technology
(Affiliated to Osmania University)
Hyderabad
2022

i
CERTIFICATE

This is to certify that the project work entitled “Big mart sales prediction using machine
learning” is being submitted Abdullah (160318733088), Mohammed Yousuf Ullah Khan
(160318733086) and Mohammed Mansoor (160318733083) in partial fulfilment of the
requirements for the award of degree of BACHELOR OF ENGINEERING IN COMPUTER
SCIENCE AND ENGINEERING by the OSMANIA UNIVERSITY, Hyderabad, under our
guidance and supervision.

The results embodied in this report have not been submitted to any other university or
institute for the award of any degree or diploma.

Internal Guide Head of the Department


Dr. C. Atheeq Dr. Syed Razi Uddin
Associate Professor Professor
Department of CSE Department of CSE
DCET, Hyderabad DCET, Hyderabad

ii
DECLARATION

This is to certify that the work reported in the present project entitled “BIG MART SALES
PREDICTION USING MACHINE LEARNING” is a record of work done by us in the
Department of Computer Science & Engineering, Deccan College of Engineering and
Technology, Osmania University, Hyderabad. In partial fulfilment of the requirement for
the award of the degree of Bachelor of Engineering in Computer Science & Engineering.

The results presented in this dissertation have been verified and are found to be satisfactory.
The results embodied in this dissertation have not been submitted to any other university for
the award of any degree or diploma.

ABDULLAH
(160318733088)

MOHAMMED YOUSUF ULLAH KHAN


(160318733086)

MOHAMMED MANSOOR
(160318733083)

iii
ACKNOWLEDGEMENT

“Task successful” makes everyone happy. But the happiness will be gold without glitter if
we didn’t state the persons who have supported us to make it a success. Success will be
crowned to people who made it reality but people whose constant guidance and
encouragement made it possible will be crowned first on the eve of success.

We are thankful to Principal DR. SYEDA GAUHAR FATIMA for providing excellent
infrastructure and a nice atmosphere for completing this project successfully.

We are thankful to Head of the department Dr. SYED RAZIUDDIN for providing the
necessary facilities during the execution of our project work.

We would like to express our sincere gratitude and indebtedness to my project supervisor
Dr. C. ATHEEQ for his valuable suggestions and interest throughout the course of this
project.

This project would not have been a success without my internal guide. So, we would extend
our deep sense of gratitude to our internal guide Dr. C. ATHEEQ for the effort he took in
guiding us in all the stages of completion of our project work. We also thank for his valuable
suggestions, advice, guidance and constructive ideas in each and every step, which was
indeed a great need towards the successful completion of the project.

We convey our heartfelt thanks to the lab staff for allowing us to use the required equipment
whenever needed.

Finally, we would like to take this opportunity to thank Almighty and our families for their
support through the work.

ABDULLAH(160318733088)

MOHAMMED YOUSUF ULLAH KHAN(160318733086)

MOHAMMED MANSOOR (160318733083)

iv
ABSTRACT

Nowadays shopping malls and Big Marts keep the track of their sales data of each and every
individual item for predicting future demand of the customer and update the inventory
management as well. These data stores basically contain a large number of customer data
and individual item attributes in a data warehouse. Further, anomalies and frequent patterns
are detected by mining the data store from the data warehouse. The resultant data can be
used for predicting future sales volume with the help of different machine learning
techniques for the retailers like Big Mart. In this paper, we propose a predictive model using
XG boost Regressor technique for predicting the sales of a company like Big Mart and found
that the model produces better performance as compared to existing models.

The sales forecast is based on Big Mart sales for various outlets to adjust the business model
to expected outcomes. The resulting data can then be used to prediction potential sales
volumes for retailers such as Big Mart through various machine learning methods. The
estimate of the system proposed should take account of price tag, outlet and outlet location.
A number of networks use the various machine- learning algorithms, such as linear
regression and decision tree algorithms, and XGBoost regressor, which offers an efficient
prevision of Big Mart sales based on gradient. At last, hyperparameter tuning is used to help
you to choose relevant hyperparameters that make the algorithm Shine and produce the
highest accuracy

v
TABLE OF CONTENTS

CERTIFICATE…………………………………………………………..……………...ii
DECLARATION……………………………………..…………………..….…………..iii
ACKNOWLEDGEMENT…………………………………………………..…………..iv
ABSTRACT………………………………………………………………….…………...v
TABLE OF CONTENTS……………………….……………………………….….......vi

1. INTRODUCTION………………………………………...…….……2
1.1 INTRODUCTION………………………………………………….2
1.2 PROBLEM STATEMENT ………………………………………..3
1.3 OBJECTIVE………………………………………………………..4
1.4 SCOPE……………………………………………………………...4
1.5 INFRASTRUCTURE……………………………………………...4
1.6 LITERATURE SURVEY………………………………………….5

2. LITERATURE………………….………………………………….……...7
2.1 AN INTRODUCTION TO MACHINE LEARNING………………….7
2.2 WHAT IS ML…………………………………………………………....15
2.3 INTRODUCTION TO DATA IN ML………………………………….20
2.4 ML APPLICATIONS …………………………………………………..25
2.5 DEMYSTIFIYING ML…………………………………………………34
2.6 DECISION TREE………………………………………………………37
2.7 RADOM FOREST REGRESSION…………………………………...42
2.8 LINEAR REGRESSION………………………………………………45
2.9 EXTRA TREES REGRESSOR……………………………………….52

3. PROPOSED SYSTEM…………………………………………………..55
3.1 SYSTEM OVERVIEW…………………………………………………...55
3.2 SYSTEM ARCHITECTURE…………………………………………….56
3.4 PROPOSED MODEL…………………………………………………….57

vi
3.5 PHASES IN MODEL ……………………………………………………..58
3.6 SYSTEM REQUIREMENT SPECIFICATIONS……………………….66
3.7 HARDWARE REQUIREMENTS ………………………………66
3.8 SOFTWARE REQUIREMENTS…………………………..……67

4. IMPLEMENTATION………………………………………………......68
4.1 EXPLORATORY DATA ANALYSIS…………………………68
4.2 DATA PREPROCESSING……………………………………..74
4.3 MODEL BUILDING……………………………………………76

5. CONCLUSION ……………………………………...………………… 80
5.1 CONCLUSION…………………………………………………81
5.2FUTURESCOPE………………………………………………..83

REFERENCES………………………………………………….84
ONLINE REFERENCES………………………………………85

vii
1. INTRODUCTION

1.1 INTRODUCTION

The sales forecast is based on Big Mart sales for various outlets to adjust the business model
to expected outcomes. The resulting data can then be used to prediction potential sales
volumes for retailers such as Big Mart through various machine learning methods. The
estimate of the system proposed should take account of price tag, outlet and outlet location.
A number of networks use the various machine- learning algorithms, such as linear
regression and decision tree algorithms, and XGBoost regressor, which offers an efficient
prevision of Big Mart sales based on gradient. At last, hyperparameter tuning is used to help
you to choose relevant hyperparameters that make the algorithm Shine and produce the
highest accuracy.

Every item is tracked for its shopping centers and Big Mart in order to anticipate a future
demand of the customer and also improve the management of its inventory. Big Mart is an
immense network of shops virtually all over the world. Trends in Big Mart are very relevant
and data scientists evaluate those trends per product and store in order to create potential
centers. Using the machine to forecast the transactions of Big Mart helps data scientists to
test the various patterns by store and product to achieve the correct results. Many companies
rely heavily on the knowledge base and need market patterns to be forecasted. Each
shopping center or store endeavors to give the individual and present moment proprietor to
draw in more clients relying upon the day, with the goal that the business volume for
everything can be evaluated for organization stock administration, logistics and
transportation administration, and so forth. To address the issue of deals expectation of
things dependent on client’s future requests in various Big Mart across different areas
diverse Machine Learning algorithms like Linear Regression, Random Forest, Decision
Tree, Ridge Regression, XGBoost are utilized for gauging of deals volume. Deals foresee
the outcome as deals rely upon the sort of store, populace around the store, a city wherein
the store is located i.e., it is possible that it is in an urban zone or country. Population
statistics around the store also affect sales, and the capacity of the store and many more

2
things should be considered. Because every business has strong demand, sales forecasts play
a significant part in a retail center. A stronger prediction is always helpful in developing and
enhancing corporate market strategies, which also help to increase awareness of the market.

In today’s modern world, huge shopping centers such as big malls and marts are recording
data related to sales of items or products with their various dependent or independent factors
as an important step to be helpful in prediction of future demands and inventory
management. The dataset built with various dependent and independent variables is a
composite form of item attributes, data gathered by means of customer, and also data related
to inventory management in a data warehouse. The data is thereafter refined in order to get
accurate predictions and gather new as well as interesting results that shed a new light on
our knowledge with respect to the task’s data. This can then further be used for forecasting
future sales by means of employing machine learning algorithms such as the random forests
and simple or multiple linear regression model

Day by day competition among different shopping malls as well as big marts is getting more
serious and aggressive only due to the rapid growth of the global malls and on-line shopping.
Every mall or mart is trying to provide personalized and short-time offers for attracting more
customers depending upon the day, such that the volume of sales for each item can be
predicted for inventory management of the organization, logistics and transport service, etc.
Present machine learning algorithm are very sophisticated and provide techniques to predict
or forecast the future demand of sales for an organization, which also helps in overcoming
the cheap availability of computing and storage systems. In this paper, we are addressing
the problem of big mart sales prediction or forecasting of an item on customer’s future
demand in different big mart stores across various locations and products based on the
previous record. Different machine learning algorithms like linear regression analysis,
random forest, etc are used for prediction or forecasting of sales volume. As good sales are
the life of every organization so the forecasting of sales plays an important role in any
shopping complex

1.2 PROBLEM STATEMENT


The data scientists at BigMart have collected 2013 sales data for 1559 products across 10
stores in different cities. Also, certain attributes of each product and store have been defined.

3
The aim is to build a predictive model and find out the sales of each product at a particular
store. Using this model, BigMart will try to understand the properties of products and stores
which play a key role in increasing sales

1.3 OBJECTIVE
To find out what role certain properties of an item play and how they affect their sales by
understanding Big Mart sales.” In order to help Big Mart achieve this goal, a predictive
model can be built to find out for every store, the key factors that can increase their sales
and what changes could be made to the product or store’s characteristics..

1.4 SCOPE
Supply and demand are two fundamental concepts of sellers and customers. Predicting
demand accurately is critical for organizations in order to be able to formulate plans. Sales
Prediction is based on predicting the sales for different outlets of Big Mart companies so
that they can change the business model according to performance predicted. In this paper,
we propose a new approach for demand prediction for Big Mart companies. The business
model used by the Big Mart companies, for which the model is implemented, includes many
outlets that sell the same product at the same time throughout the country where the company
operates a market place model. The demand prediction for such a model should consider the
price tag, outlet type, outlet location. In this study, we first applied linear regression and
decision tree algorithm for the specific set of outlets of one of the most popular Big Mart
Companies in the USA. Then we used XGBoost regressor, a gradient-based algorithm to
predict sales . Finally, all the approaches are evaluated on a real-world data set obtained
from the Big Mart Company. The experimental results show that the XGBoost regressor
gives pretty accurate sales results

1.5 INFRASTRUCTURE
• The project is being implemented in the language python and the database is made using
MySQL server 8.0 as well as MySQLyog Community is also used.
• The Project is implemented in the jupiter notebook which runs on Windows, macOS.
• The windows Operating System is used having processor speed at least 1GHz or faster
processor.

4
1.6 LITERATURE SURVEY

1. Title: - A Forecast for Big Mart Sales Based on Random Forests and Multiple Linear
Regression (2018)

Author: - Kadam, H., Shevade, R., Ketkar, P. and Rajguru

Description: - A Forecast for Big Mart Sales Based on Random Forests and Multiple Linear
Regression used Random Forest and Linear Regression for prediction analysis which gives
less accuracy. To overcome this we can use XG boost Algorithm which will give more
accuracy and will be more efficient

2. Title: - Forecasting methods and applications (2008)

Author: - Makridakis, S., Wheelwrigh.S.C., Hyndman. R.J

Description: - Forecasting methods and applications contains Lack of Data and short life
cycles. So some of the data like historical data, consumer-oriented markets face uncertain
demands, can be prediction for accurate result

3. Title: -Comparison of Different Machine Learning Algorithms for Multiple Regression


on Black Friday Sales Data (2018)

Author: - C. M. Wu, P. Patil and S. Gunaseelan

5
Description: - Comparison of Different Machine Learning Algorithms for Multiple
Regression on Black Friday Sales Data Used Neural Network for comparison of different
algorithms. To overcome this Complex models like neural networks is used for comparison
between different algorithms which is not efficient so we can use more simpler algorithm
for prediction

4. Title: -Prediction of retail sales of footwear using feed forward and recurrent Neural

Networks (2018)

Author: - Das, P., Chaudhury

Description: - Prediction of retail sales of footwear using feed forward and recurrent neural
networks used neural networks for prediction of sales. Using neural network for predicting
of weeklyretail sales, which is not efficient, So XG boost can work efficiently

6
2. LITERATURE

2.1 An Introduction to Machine Learning


The term Machine Learning was coined by Arthur Samuel in 1959, an American pioneer
in the field of computer gaming and artificial intelligence, and stated that “it gives
computers the ability to learn without being explicitly programmed”.
And in 1997, Tom Mitchell gave a “well-posed” mathematical and relational definition
that “A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.
Machine Learning is the latest buzzword floating around. It deserves to, as it is one of
the most interesting subfields of Computer Science. So what does Machine Learning
really mean?
Let’s try to understand Machine Learning in layman’s terms. Consider you are trying to
toss a paper into a dustbin.
After the first attempt, you realize that you have put too much force into it. After the
second attempt, you realize you are closer to the target but you need to increase your
throw angle. What is happening here is basically after every throw we are learning
something and improving the end result. We are programmed to learn from our
experience.
This implies that the tasks in which machine learning is concerned to offer a
fundamentally operational definition rather than defining the field in cognitive terms.
This follows Alan Turing’s proposal in his paper “Computing Machinery and
Intelligence”, in which the question “Can machines think?” is replaced with the question
“Can machines do what we (as thinking entities) can do?”
Within the field of data analytics, machine learning is used to devise complex models
and algorithms that lend themselves to prediction; in commercial use, this is known as
predictive analytics. These analytical models allow researchers, data scientists,
engineers, and analysts to “produce reliable, repeatable decisions and results” and
uncover “hidden insights” through learning from historical relationships and trends in the
data set(input).
Suppose that you decide to check out that offer for a vacation. You browse through the
travel agency website and search for a hotel. When you look at a specific hotel, just
below the hotel description there is a section titled “You might also like these hotels”.
This is a common use case of Machine Learning called “Recommendation Engine”.
Again, many data points were used to train a model in order to predict what will be the
best hotels to show you under that section, based on a lot of information they already
know about you.
So if you want your program to predict, for example, traffic patterns at a busy
intersection (task T), you can run it through a machine learning algorithm with data
about past traffic patterns (experience E) and, if it has successfully “learned”, it will then
do better at predicting future traffic patterns (performance measure P).
The highly complex nature of many real-world problems, though, often means that
inventing specialized algorithms that will solve them perfectly every time is impractical,
if not impossible. Examples of machine learning problems include, “Is this cancer?”,

7
“Which of these people are good friends with each other?”, “Will this person like this
movie?” such problems are excellent targets for Machine Learning, and in fact, machine
learning has been applied to such problems with great success.
Due to the large decrease in technology and sensors prices, we can now create, store and
send more data than ever in history. Up to ninety percent of the data in the world today
has been created in the last two years alone. There are 2.5 quintillion bytes of data
created each day at our current pace and this pace is only expected to grow. This data
feed the machine learning models and it is the main driver of the boom that this science
has experienced in recent years.

Machine Learning is one of the subfields of Artificial Inteligence and can be described
as:

“Machine Learning is the science of getting computers to learn and act like humans do,
and improve their learning over time in autonomous fashion, by feeding them data and
information in the form of observations and real-world interactions.” — Dan Fagella

Machine learning offers an efficient way for capturing knowledge in data to gradually
improve the performance of predictive models, and make data-driven decisions. It has
become an ubiquitous technology and we enjoy its benefits in: e-mail spam filters, self-
driving cars, image and voice recognition and world-class go players.
Generally in machine learning it is used matrix and vector notations to refer to the data.
This data is used normally in matrix form where:

Each separate row of the matrix is a sample, observation or data point.


Each column is feature (or attribute) of that observation.
Usually there is one column (or feature), that we will call the target, label or response,
and its the value or class that we’re trying to predict.
To train a machine learning model is to provide a machine learning algorithm with
training data to learn from it.

Regarding machine learning algorithms, they usually have some inner parameters. ie: In
Decision Trees, there are parameters like depth, number of nodes, number of leaves…
This inner parameters are called hyperparameters.

Generalization is the ability of the model to make predictions on new data.

Classification of Machine Learning


Machine learning implementations are classified into three major categories, depending
on the nature of the learning “signal” or “response” available to a learning system which

8
is as follows:-

1. Supervised learning: When an algorithm learns from example data and


associated target responses that can consist of numeric values or string labels,
such as classes or tags, in order to later predict the correct response when
posed with new examples comes under the category of Supervised learning.
This approach is indeed similar to human learning under the supervision of a
teacher. The teacher provides good examples for the student to memorize, and
the student then derives general rules from these specific examples.
Supervised learning refers to a kind of machine learning models that are trained with
a set of samples where the desired output signals (or labels) are already known. The
models learn from these already known results and make adjustments in their inner
parameters to adapt themselves to the input data. Once the model is properly trained,
it can make accurate predictions about unseen or future data.

An overview of the general process:

Unsupervised learning: Whereas when an algorithm learns from plain examples


without any associated response, leaving to the algorithm to determine the data
patterns on its own. This type of algorithm tends to restructure the data into
something else, such as new features that may represent a class or a new series of
un-correlated values. They are quite useful in providing humans with insights into
the meaning of data and new useful inputs to supervised machine learning
algorithms.
As a kind of learning, it resembles the methods humans use to figure out that
certain objects or events are from the same class, such as by observing the degree
of similarity between objects. Some recommendation systems that you find on the
web in the form of marketing automation are based on this type of learning.
In unsupervised learning we deal with unlabeled data of unknown structure and the
goal is to explore the structure of the data to extract meaningful information, without
the reference of a known outcome variable.
There are two main categories: clustering and dimensionality reduction.

9
Clustering:
Clustering is an exploratory data analysis technique used for organizing information into
meaningful clusters or subgroups without any prior knowledge of its structure. Each
cluster is a group of similar objects that is different to objects of the other clusters.

2. Dimensionality Reduction:
It is common to work with data in which each observation comes with a high
number of features, in other words, that have high dimensionality. This can be a
challenge for the computational performance of Machine Learning algorithms, so
dimensionality reduction is one of the techniques used for dealing with this issue.
Dimensionality reduction methods work by finding correlations between the features,
which would mean that there is redundant information, as some feature could be partially
explained with the others. It removes noise from data (which can also decrease the
model’s performance) and compress data to a smaller subspace while retaining most of
the relevant information.

Deep Learning

Deep learning is a subfield of machine learning, that uses a hierarchical structure of


artificial neural networks, which are built in a similar fashion of a human brain, with the
neuron nodes connected as a web. That architechture allows to tackle the data analysis in
a non-linear way.

The first layer of the neural network takes raw data as an input, processes it, extracts
some information and passes it to the next layer as an output. Each layer then processes
the information given by the previous one and repeats, until data reaches the final layer,
which makes a prediction.

10
This prediction is compared with the known result and then, by a method called
backpropagation, the model is able to learn the weights that yield accurate outputs.

Reinforcement learning: When you present the algorithm with examples that lack
labels, as in unsupervised learning. However, you can accompany an example with
positive or negative feedback according to the solution the algorithm proposes comes
under the category of Reinforcement learning, which is connected to applications for
which the algorithm must make decisions (so the product is prescriptive, not just
descriptive, as in unsupervised learning), and the decisions bear consequences. In the
human world, it is just like learning by trial and error.
Errors help you learn because they have a penalty added (cost, loss of time, regret, pain,
and so on), teaching you that a certain course of action is less likely to succeed than
others. An interesting example of reinforcement learning occurs when computers learn to
play video games by themselves.
In this case, an application presents the algorithm with examples of specific situations,
such as having the gamer stuck in a maze while avoiding an enemy. The application lets
the algorithm know the outcome of actions it takes, and learning occurs while trying to
avoid what it discovers to be dangerous and to pursue survival. You can have a look at
how the company Google DeepMind has created a reinforcement learning program that
plays old Atari’s video games. When watching the video, notice how the program is
initially clumsy and unskilled but steadily improves with training until it becomes a
champion.
Reinforcement learning is one of the most important branches of Deep Learning. The
goal is to build a model in which there is an agent that takes actions and where the aim is
to improve its performance. This improvement is done by giving an specific reward each
time that the agent performs an action that belongs to the set of actions that the developer
wants the agent to perform.

The reward is a measurement of how well the action was in order to achieve a predefined
goal. The agent then uses this feedback to adjust its future behaviour, with the objective
of obtaining the most reward.

One common example is a chess engine, where the agent decides from a series of
possible actions, depending on the board’s disposition (which is the environment’s state)
and the reward is given when winning or loosing the game.

11
Semi-supervised learning: where an incomplete training signal is given: a training set
with some (often many) of the target outputs missing. There is a special case of this
principle known as Transduction where the entire set of problem instances is known at
learning time, except that part of the targets are missing.

he basic difference between Supervised and unsupervised learning is that supervised


learning datasets consist of an output label training data associated with each tuple, and
unsupervised datasets do not consist the same. Semi-supervised learning is an important
category that lies between the Supervised and Unsupervised machine learning. Although
Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of
unlabeled data. As labels are costly, but for the corporate purpose, it may have few
labels.

The basic disadvantage of supervised learning is that it requires hand-labeling by ML


specialists or data scientists, and it also requires a high cost to process. Further
unsupervised learning also has a limited spectrum for its applications. To overcome these
drawbacks of supervised learning and unsupervised learning algorithms, the concept of
Semi-supervised learning is introduced. In this algorithm, training data is a combination
of both labeled and unlabeled data. However, labeled data exists with a very small
amount while it consists of a huge amount of unlabeled data. Initially, similar data is
clustered along with an unsupervised learning algorithm, and further, it helps to label the
unlabeled data into labeled data. It is why label data is a comparatively, more expensive
acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a


student is under the supervision of an instructor at home and college. Further, if that
student is self-analyzing the same concept without any help from the instructor, it comes

12
under unsupervised learning. Under semi-supervised learning, the student has to revise
itself after analyzing the same concept under the guidance of an instructor at college.

Assumptions followed by Semi-Supervised Learning


To work with the unlabeled dataset, there must be a relationship between the objects. To
understand this, semi-supervised learning uses any of the following assumptions:

Continuity Assumption:
As per the continuity assumption, the objects near each other tend to share the same
group or label. This assumption is also used in supervised learning, and the datasets are
separated by the decision boundaries. But in semi-supervised, the decision boundaries
are added with the smoothness assumption in low-density boundaries.
Cluster assumptions- In this assumption, data are divided into different discrete clusters.
Further, the points in the same cluster share the output label.
Manifold assumptions- This assumption helps to use distances and densities, and this
data lie on a manifold of fewer dimensions than input space.
The dimensional data are created by a process that has less degree of freedom and may
be hard to model directly. (This assumption becomes practical if high).
Working of Semi-Supervised Learning
Semi-supervised learning uses pseudo labeling to train the model with less labeled
training data than supervised learning. The process can combine various neural network
models and training ways. The whole working of semi-supervised learning is explained
in the below points:

Firstly, it trains the model with less amount of training data similar to the supervised
learning models. The training continues until the model gives accurate results.
The algorithms use the unlabeled dataset with pseudo labels in the next step, and now the
result may not be accurate.
Now, the labels from labeled training data and pseudo labels data are linked together.
The input data in labeled training data and unlabeled training data are also linked.
In the end, again train the model with the new combined input as did in the first step. It
will reduce errors and improve the accuracy of the model.
Difference between Semi-supervised and Reinforcement Learning.
Reinforcement learning is different from semi-supervised learning, as it works with
rewards and feedback. Reinforcement learning aims to maximize the rewards by their hit
and trial actions, whereas in semi-supervised learning, we train the model with a less
labeled dataset.

13
Real-world applications of Semi-supervised Learning-
Semi-supervised learning models are becoming more popular in the industries. Some of
the main applications are as follows.

Speech Analysis- It is the most classic example of semi-supervised learning applications.


Since, labeling the audio data is the most impassable task that requires many human
resources, this problem can be naturally overcome with the help of applying SSL in a
Semi-supervised learning model.
Web content classification- However, this is very critical and impossible to label each
page on the internet because it needs mode human intervention. Still, this problem can be
reduced through Semi-Supervised learning algorithms.
Further, Google also uses semi-supervised learning algorithms to rank a webpage for a
given query.
Protein sequence classification- DNA strands are larger, they require active human
intervention. So, the rise of the Semi-supervised model has been proximate in this field.
Text document classifier- As we know, it would be very unfeasible to find a large
amount of labeled text data, so semi-supervised learning is an ideal model to overcome
this.
Categorizing on the basis of required Output
Another categorization of machine learning tasks arises when one considers the desired
output of a machine-learned system:

1. Classification: When inputs are divided into two or more classes, and the
learner must produce a model that assigns unseen inputs to one or more
(multi-label classification) of these classes. This is typically tackled in a
supervised way. Spam filtering is an example of classification, where the
inputs are email (or other) messages and the classes are “spam” and “not
spam”.
2. Regression: Which is also a supervised problem, A case when the outputs are
continuous rather than discrete.
3. Clustering: When a set of inputs is to be divided into groups. Unlike in
classification, the groups are not known beforehand, making this typically an
unsupervised task.
Machine Learning comes into the picture when problems cannot be solved by means of
typical approaches.

14
2.2 What is Machine Learning ?

Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming,
coined the term “Machine Learning”. He defined machine learning as – “Field of
study that gives computers the capability to learn without being explicitly
programmed”.
In a very layman manner, Machine Learning(ML) can be explained as automating and
improving the learning process of computers based on their experiences without being
actually programmed i.e. without any human assistance. The process starts with feeding
good quality data and then training our machines(computers) by building machine
learning models using the data and different algorithms. The choice of algorithms
depends on what type of data do we have and what kind of task we are trying to
automate.

Example: Training of students during exam.


While preparing for the exams students don’t actually cram the subject but try to learn it
with complete understanding. Before the examination, they feed their machine(brain)
with a good amount of high-quality data (questions and answers from different books or
teachers notes or online video lectures). Actually, they are training their brain with input
as well as output i.e. what kind of approach or logic do they have to solve a different
kind of questions. Each time they solve practice test papers and find the performance
(accuracy /score) by comparing answers with answer key given, Gradually, the
performance keeps on increasing, gaining more confidence with the adopted approach.
That’s how actually models are built, train machine with data (both inputs and outputs
are given to model) and when the time comes test on data (with input only) and achieves
our model scores by comparing its answer with the actual output which has not been fed
while training. Researchers are working with assiduous efforts to improve algorithms,
techniques so that these models perform even much better
.

Basic Difference in ML and Traditional Programming?


• Traditional Programming : We feed in DATA (Input) + PROGRAM
(logic), run it on machine and get output.
• Machine Learning : We feed in DATA(Input) + Output, run it on machine
during training and the machine creates its own program(logic), which can be
evaluated while testing.

15
What does exactly learning means for a computer?
A computer is said to be learning from Experiences with respect to some class of Tasks,
if its performance in a given Task improves with the Experience.
A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game
In general, any machine learning problem can be assigned to one of two broad
classifications:
Supervised learning and Unsupervised learning.
How things work in reality:-
• Talking about online shopping, there are millions of users with an unlimited
range of interests with respect to brands, colors, price range and many more.
While online shopping, buyers tend to search for a number of products. Now,
searching a product frequently will make buyer’s Facebook, web pages,
search engine or that online store start recommending or showing offers on
that particular product. There is no one sitting over there to code such task for
each and every user, all this task is completely automatic. Here, ML plays its
role. Researchers, data scientists, machine learners build models on the
machine using good quality and a huge amount of data and now their machine
is automatically performing and even improving with more and more
experience and time.
Traditionally, the advertisement was only done using newspapers, magazines
and radio but now technology has made us smart enough to do Targeted
advertisement (online ad system) which is a way more efficient method to
target most receptive audience.

• Even in health care also, ML is doing a fabulous job. Researchers and


scientists have prepared models to train machines for detecting cancer just
by looking at slide – cell images. For humans to perform this task it would
have taken a lot of time. But now, no more delay, machines predict the
chances of having or not having cancer with some accuracy and doctors just
have to give an assurance call, that’s it. The answer to – how is this possible
is very simple -all that is required, is, high computation machine, a large
amount of good quality image data, ML model with good algorithms to
achieve state-of-the-art results.
Doctors are using ML even to diagnose patients based on different
parameters under consideration.

• You all might have use IMDB ratings, Google Photos where it recognizes
faces, Google Lens where the ML image-text recognition model can extract
text from the images you feed in, Gmail which categories E-mail as social,
promotion, updates or forum using text classification,which is a part of ML.

16
How ML works?
• Gathering past data in any form suitable for processing.The better the quality
of data, the more suitable it will be for modeling
• Data Processing – Sometimes, the data collected is in the raw form and it
needs to be pre-processed.
Example: Some tuples may have missing values for certain attributes, an, in
this case, it has to be filled with suitable values in order to perform machine
learning or any form of data mining.
Missing values for numerical attributes such as the price of the house may be
replaced with the mean value of the attribute whereas missing values for
categorical attributes may be replaced with the attribute with the highest
mode. This invariably depends on the types of filters we use. If data is in the
form of text or images then converting it to numerical form will be required,
be it a list or array or matrix. Simply, Data is to be made relevant and
consistent. It is to be converted into a format understandable by the machine
• Divide the input data into training,cross-validation and test sets. The ratio
between the respective sets must be 6:2:2
• Building models with suitable algorithms and techniques on the training set.
• Testing our conceptualized model with data which was not fed to the model at
the time of training and evaluating its performance using metrics such as F1
score, precision and recall.

General Methodology for Building Machine Learning Models

Preprocessing:

17
It is one of the most crucial steps in any Machine Learning application.
Usually data comes in a format that is not optimal (or even inadequate) for the
model to process it. In that cases preprocessing is a mandatory task to do.

Many algorithms require the features to be on the same scale (for example: to
be in the [0,1] range) for optimizing its performance, and this is often done by
applying normalization or standardization techniques on the data.

We can also find in some cases that the selected features are correlated and
therefore, redundant for extracting meaningful information from them. Then
we must use dimensionality reduction techniques to compress the features to
smaller dimensional subspaces.

Finally, we’ll split randomly our original dataset into training and testing
subsets.

Training and Selecting a Model


It is essential to compare a bunch of different algorithms in order to train and
select the best performing one. To do so, it is necessary to select a metric for
measuring the model’s performance. One commonly used in classification
problems is classification accuracy, which is the proportion of correctly
classified instances. In regression problems one of the most popular is Mean
Squared Error (MSE), that measures the average squared difference between
the estimated values and the real values.

Figure bu Author
Finally, we will use a technique called cross-validation to make sure that our
model will perform well on real-world data befiore using the testing subset
for the final evaluation of the model.

This technique divides the training dataset into smaller training and validating
subsets, then estimates the generalization ability of the model, in other words,
estimating how well it can predict outcomes when provided with new data. It
then repeats the process, K times and computes the average performance of
the model by dividing the sum of the metrics obtained between the K number
of iterations.

18
In general, the default parameters of the machine learning algorithms
provided by the libraries are not the best ones to use with our data, so we will
use hyperparameter optimization techniques to help us to do the fine tunning
of the model’s performance.

Evaluating Models and Predicting with New Data


Once we have selected and fitted a model to our training dataset, we can use
the testing dataset to estimate the performance on this unseen data, so we can
make an estimation of the generalization error of the model. Or evaluate it
using some other metric.

If we are satisfied with the value of the metric obtained, we can use then the
model to make predictions on future data.

Wrap Up
In this article we learned what is Machine Learning painting a big picture of
its nature, motivation and applications.

We also learned some basic notations and terminology and the different kinds
of machine learning algorithms:

Supervised learning, with classification and regression problems.


Unsupervised learning, with clustering and dimensionality reduction.
Reinforcement learning, where the agent learn from its environment.
Deep learning and their artificial neuron networks.
Finally, we made an introduction to the typical methodology for building
Machine Learning models and explained its main tasks:

Preprocessing.
Training and testing.
Selecting a model.
Evaluating.

19
2.3 Introduction to Data in Machine Learning

DATA: It can be any unprocessed fact, value, text, sound, or picture that is not being
interpreted and analyzed. Data is the most important part of all Data Analytics, Machine
Learning, Artificial Intelligence. Without data, we can’t train any model and all modern
research and automation will go in vain. Big Enterprises are spending lots of money just
to gather as much certain data as possible.
Example: Why did Facebook acquire WhatsApp by paying a huge price of $19 billion?
The answer is very simple and logical – it is to have access to the users’ information that
Facebook may not have but WhatsApp will have. This information of their users is of
paramount importance to Facebook as it will facilitate the task of improvement in their
services.
INFORMATION: Data that has been interpreted and manipulated and has now some
meaningful inference for the users.
KNOWLEDGE: Combination of inferred information, experiences, learning, and
insights. Results in awareness or concept building for an individual or organization.

Machine learning is a form of artificial intelligence (AI) that teaches computers to think
in a similar way to humans: learning and improving upon past experiences. Almost any
task that can be completed with a data-defined pattern or set of rules can be automated
with machine learning.

So, why is machine learning important? It allows companies to transform processes that
were previously only possible for humans to perform—think responding to customer
service calls, bookkeeping, and reviewing resumes for everyday businesses. Machine
learning can also scale to handle larger problems and technical questions—think image
detection for self-driving cars, predicting natural disaster locations and timelines, and
understanding the potential interaction of drugs with medical conditions before clinical
trials. That’s why machine learning is important.

Why is data important for machine learning?


We’ve covered the question ‘why is machine learning important,’ now we need to
understand the role data plays. Machine learning data analysis uses algorithms to
continuously improve itself over time, but quality data is necessary for these models to
operate efficiently.

To truly understand how machine learning works, you must also understand the data by
which it operates. Today, we will be discussing what machine learning datasets are, the

20
types of data needed for machine learning to be effective, and where engineers can find
datasets to use in their own machine learning models.

What is a dataset in machine learning?


To understand what a dataset is, we must first discuss the components of a dataset. A
single row of data is called an instance. Datasets are a collection of instances that all
share a common attribute. Machine learning models will generally contain a few
different datasets, each used to fulfill various roles in the system.

For machine learning models to understand how to perform various actions, training
datasets must first be fed into the machine learning algorithm, followed by validation
datasets (or testing datasets) to ensure that the model is interpreting this data accurately.

Once you feed these training and validation sets into the system, subsequent datasets can
then be used to sculpt your machine learning model going forward. The more data you
provide to the ML system, the faster that model can learn and improve.

What type of data does machine learning need?


Data can come in many forms, but machine learning models rely on four primary data
types. These include numerical data, categorical data, time series data, and text data.

types of data
Numerical data
Numerical data, or quantitative data, is any form of measurable data such as your height,
weight, or the cost of your phone bill. You can determine if a set of data is numerical by
attempting to average out the numbers or sort them in ascending or descending order.
Exact or whole numbers (ie. 26 students in a class) are considered discrete numbers,
while those which fall into a given range (ie. 3.6 percent interest rate) are considered
continuous numbers. While learning this type of data, keep in mind that numerical data is
not tied to any specific point in time, they are simply raw numbers.

Categorical data
Categorical data is sorted by defining characteristics. This can include gender, social
class, ethnicity, hometown, the industry you work in, or a variety of other labels. While
learning this data type, keep in mind that it is non-numerical, meaning you are unable to
add them together, average them out, or sort them in any chronological order.
Categorical data is great for grouping individuals or ideas that share similar attributes,
helping your machine learning model streamline its data analysis.

Time series data

21
Time series data consists of data points that are indexed at specific points in time. More
often than not, this data is collected at consistent intervals. Learning and utilizing time
series data makes it easy to compare data from week to week, month to month, year to
year, or according to any other time-based metric you desire. The distinct difference
between time series data and numerical data is that time series data has established
starting and ending points, while numerical data is simply a collection of numbers that
aren’t rooted in particular time periods.

Text data
Text data is simply words, sentences, or paragraphs that can provide some level of
insight to your machine learning models. Since these words can be difficult for models to
interpret on their own, they are most often grouped together or analyzed using various
methods such as word frequency, text classification, or sentiment analysis.

Where do engineers get datasets for machine learning?


There is an abundance of places you can find machine learning data, but we have
compiled five of the most popular ML dataset resources to help get you started:

Five most popular ML dataset resources


Google’s Dataset Search
Google released their Google Dataset Search Engine in September 2018. Use this tool to
view datasets across a wide array of topics such as global temperatures, housing market
information, or anything else that peaks your interest. Once you enter your search,
several applicable datasets will appear on the left side of your screen. Information will be
included about each dataset’s date of publication, a description of the data, and a link to
the data source. This is a popular ML dataset resource that can help you find unique
machine learning data.

Microsoft Research Open Data


Microsoft is another technological leader who has created a database of free, curated
datasets in the form of Microsoft Research Open Data. These datasets are available to the
public and are used to “advance state-of-the-art research in areas such as natural
language processing, computer vision, and domain specific sciences.” Download datasets
from published research studies or copy them directly to a cloud-based Data Science
Virtual Machine to enjoy reputable machine learning data.

Amazon datasets
Amazon Web Services (AWS) has grown to be one of the largest on-demand cloud
computing platforms in the world. With so much data being stored on Amazon’s servers,
a plethora of datasets have been made available to the public through AWS resources.
These datasets are compiled into Amazon’s Registry of Open Data on AWS. Looking up

22
datasets is straightforward, with a search function, dataset descriptions, and usage
examples provided. This is one of the most popular ways to extract machine learning
data.

UCI Machine Learning Repository


The University of California, School of Information and Computer Science, provides a
large amount of information to the public through its UCI Machine Learning Repository
database. This database is prime for machine learning data as it includes nearly 500
datasets, domain theories, and data generators which are used for “the empirical analysis
of machine learning algorithms.” Not only does this make searching easy, but UCI also
classifies each dataset by the type of machine learning problem, simplifying the process
even further.

Government datasets
The United States Government has released several datasets for public use. As another
great avenue for machine learning data, these datasets can be used for conducting
research, creating data visualizations, developing web/mobile applications, and more.
The US Government database can be found at Data.gov and contains information
pertaining to industries such as education, ecosystems, agriculture, and public safety,
among others. Many countries offer similar databases and most are fairly easy to find.

Why is machine learning popular?


Machine learning is a booming technology because it benefits every type of business
across every industry. The applications are limitless. From healthcare to financial
services, transportation to cyber security, and marketing to government, machine
learning can help every type of business adapt and move forward in an agile manner.

You might be good at sifting through a massive organized spreadsheet and identifying a
pattern, but thanks to machine learning and artificial intelligence, algorithms can
examine much larger datasets and understand connective patterns even faster than any
human, or any human-created spreadsheet function, ever could. Machine learning allows
businesses to collect insights quickly and efficiently, speeding the time to business value.
That’s why machine learning is important for every organization.

Machine learning also takes the guesswork out of decisions. While you may be able to
make assumptions based on data averages from spreadsheets or databases, machine
learning algorithms can analyze massive volumes of data to provide exhaustive insights
from a comprehensive picture. Put shortly: machine learning allows for higher accuracy
outputs across an ever growing amount of inputs.
How we split data in Machine Learning?
• Training Data: The part of data we use to train our model. This is the data
that your model actually sees(both input and output) and learns from.

23
• Validation Data: The part of data that is used to do a frequent evaluation of
the model, fit on the training dataset along with improving involved
hyperparameters (initially set parameters before the model begins learning).
This data plays its part when the model is actually training.
• Testing Data: Once our model is completely trained, testing data provides an
unbiased evaluation. When we feed in the inputs of Testing data, our model
will predict some values(without seeing actual output). After prediction, we
evaluate our model by comparing it with the actual output present in the
testing data. This is how we evaluate and see how much our model has
learned from the experiences feed in as training data, set at the time of
training.

Consider an example:
There’s a Shopping Mart Owner who conducted a survey for which he has a long list of
questions and answers that he had asked from the customers, this list of questions and
answers is DATA. Now every time when he wants to infer anything and can’t just go
through each and every question of thousands of customers to find something relevant as
it would be time-consuming and not helpful. In order to reduce this overhead and time
wastage and to make work easier, data is manipulated through software, calculations,
graphs, etc. as per own convenience, this inference from manipulated data
is Information. So, Data is a must for Information. Now Knowledge has its role in
differentiating between two individuals having the same information. Knowledge is
actually not technical content but is linked to the human thought process.
Properties of Data –
1. Volume: Scale of Data. With the growing world population and technology at
exposure, huge data is being generated each and every millisecond.
2. Variety: Different forms of data – healthcare, images, videos, audio
clippings.
3. Velocity: Rate of data streaming and generation.
4. Value: Meaningfulness of data in terms of information that researchers can
infer from it.
5. Veracity: Certainty and correctness in data we are working on.
Some facts about Data:
• As compared to 2005, 300 times i.e. 40 Zettabytes (1ZB=10^21 bytes) of data
will be generated by 2020.
• By 2011, the healthcare sector has a data of 161 Billion Gigabytes
• 400 Million tweets are sent by about 200 million active users per day
• Each month, more than 4 billion hours of video streaming is done by the
users.
• 30 Billion different types of content are shared every month by the user.

24
• It is reported that about 27% of data is inaccurate and so 1 in 3 business
idealists or leaders don’t trust the information on which they are making
decisions.
The above-mentioned facts are just a glimpse of the actually existing huge data statistics.
When we talk in terms of real-world scenarios, the size of data currently presents and is
getting generated each and every moment is beyond our mental horizons to imagine.

2.4 Machine Learning – Applications


• Web Search Engine: One of the reasons why search engines like google,
bing etc work so well is because the system has learnt how to rank pages
through a complex learning algorithm.
• Photo tagging Applications: Be it facebook or any other photo
tagging application, the ability to tag friends makes it even more happening. It
is all possible because of a face recognition algorithm that runs behind
the application.
• Spam Detector: Our mail agent like Gmail or Hotmail does a lot of hard
work for us in classifying the mails and moving the spam mails to spam
folder. This is again achieved by a spam classifier running in the back end of
mail application.
Today, companies are using Machine Learning to improve business decisions,increase
productivity, detect disease, forecast weather, and do many more things. With the
exponential growth of technology, we not only need better tools to understand the data we
currently have, but we also need to prepare ourselves for the data we will have. To achieve
this goal we need to build intelligent machines. We can write a program to do simple
things. But for most of times Hardwiring Intelligence in it is difficult. Best way to do it is
to have some way for machines to learn things themselves. A mechanism for learning – if
a machine can learn from input then it does the hard work for us. This is where Machine
Learning comes in action. Some examples of machine learning are:
• Database Mining for growth of automation: Typical applications
include Web-click data for better UX( User eXperience), Medical records for
better automation in healthcare, biological data and many more.
• Applications that cannot be programmed: There are some tasks that
cannot be programmed as the computers we use are not modelled that way.
Examples include Autonomous Driving, Recognition tasks from unordered
data (Face Recognition/ Handwriting Recognition), Natural language
Processing, computer Vision etc.
• Understanding Human Learning: This is the closest we have understood
and mimicked the human brain. It is the start of a new revolution, The real
AI. Now, After a brief insight lets come to a more formal definition of
Machine Learning
• Arthur Samuel(1959): “Machine Learning is a field of study that
gives computers, the ability to learn without explicitly being
programmed.”Samuel wrote a Checker playing program which could learn
over time. At first it could be easily won. But over time, it learnt all the board

25
position that would eventually lead him to victory or loss and thus became a
better chess player than Samuel itself. This was one of the most early attempts
of defining Machine Learning and is somewhat less formal.
• Tom Michel(1999): “A computer program is said to learn from experience
E with respect to some class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P, improves with experience
E.” This is a more formal and mathematical definition. For the previous Chess
program
• E is number of games.
• T is playing chess against computer.
• P is win/loss by computer.

Machine Learning is the technology of identifying the possibilities hidden in the data and
turning them into fully-fledged opportunities. Coincidentally, chances are the things that
fuel business operation and help stand out among competitors.

It is crucial to understand how machine learning algorithms are applied in various fields
to get the kind of results that would lead to legitimate business advantages.

Machine Learning in Data Analytics - CRM, Marketing Analytics, Audience Research


Understanding the big picture is a requirement for any company that wants to succeed in
a chosen field. Data analytics is one of the preeminent tools that makes it possible. In
essence, data analytics is a three-fold process. It involves:

gathering data from different sources,


extracting the valuable insights out of it
presenting it in a comprehensive manner (i.e., visualizing).
Machine learning algorithms are applied at various stages to secure the efficiency and
the accuracy of the process.

The clustering algorithms are used to explore the data;


The classification algorithms are used to group data, sift through it and get the gist of it;
Dimensionality reduction algorithms are used to visualize data, i.e. show it in a coherent
form.
Essentially, these data analytics algorithms construct a robust framework for quality
decision-making.

As such, data analytics is used practically in every business aspect of business operation.

PREDICTIVE ANALYTICS VS. MACHINE LEARNING: WHAT IS THE


DIFFERENCE

Let’s run down the most common.

Sales and operations planning tools are unified dashboards for monitoring the activity in
general and in detail. In other words, it is a secure system that uses data analytics on full
scale.
Product Analytics - as a center for the information regarding the product use;
Customer Modelling and Audience Segmentation - data analytics is used to identify
relevant audience segments, and to define and describe subcategories of the customers.

26
Predictive analytics - it is also capable of calculating possible courses of action for
different kinds of users in specific scenarios.
Market Research / Content Research is a set of tools to describe an environment around
you. It gets to know better what the current market situation is and what kind of action
should be taken to make the most out of it.
Read also: Machine Learning for Mobile Apps

machine-learning-applications

Machine learning for Predictive Analytics - Stock Market Forecasting, Market Research,
Fraud Prevention
When it comes to gaining a competitive advantage with machine learning techniques -
Data Analytics is one side of the coin. The other side of the coin is predictive analytics.
That’s where machine learning comes in full swing.

You see - it is one thing to get the data from different sources in one place, to extract
insights and show the thick of it. It is process automation with some fancy tricks. It is an
entirely different thing to look at what the future holds and plan your moves accordingly.

That is what predictive analytics is for.

WHAT IS THE DIFFERENCE BETWEEN WEB 2.0 VS. WEB 3.0?

Here’s how it works.


The prerequisite feature of data is patterns.
There are sequences of patterns that can be explored if you have enough information
about the pattern behavior. This information is often called “the historical data.”
Based on extracted insights, the algorithm can build an assumption of what may come
next and determine the probability of a certain turn of events.

27
For example, let’s take the stock price. The price of a particular stock is known to be
volatile around a certain mark due to a variety of factors. The influence of these factors
over time is taken into consideration upon calculating the further volatility of the price
and planning further actions.

Predictive analytics is widely applied in the following fields:

Supply Chain Management - used to control the product’s flow. Commonly used in retail
commerce and eCommerce to handle the product inventory’s demand and supply routine
intact.
Stock Forecasting is one of the purest uses of predictive analytics. You have numbers,
and you need to compute the volatility of a particular number figure in correlation with a
variety of incoming factors.
Recommender engines and content aggregators apply predictive analytics to make
assumptions of the relevant content based on the preferences, intent, and behavior of the
user. The content suggestion is one of the most basic types of service personalization on
a consumer level.
Fraud Prevention - machine learning and predictive analytics give the power of foresight
that can assist in exposing the fraudulent activity. In addition to that, it can provide a
complete picture of the perpetration. The thing is - the majority of fraudulent online
activities are made with the assistance of automated algorithms. These algorithms work
in patterns, which can be extracted and predicted from the data. Predictive machine
learning algorithms are used to expose spam messages, account hijacking, fraud
payments, bot traffic, and other types of digital fraudulence.
How to make your IT project secured?
Download Secure Coding Guide

Service Personalization - Recommender Engines, User Modeling


Every user loves when the service delivers what the user wants and then some. That’s a
foundational element of user engagement and a step towards building a strong
relationship between the product and its user.

Things can get even better when the said service is tailor-made for the needs and the
preferences of the end-users. That’s personalization in action, and there is a lot of
machine learning involved.

Personalization makes the most of available user data, calculates the possibilities, and
turns them into a valuable asset of the business operation.

Personalization features are widely used in different services to

increase user engagement to the service;


make the whole user experience much more efficient and fulfilling.
To make that happen, regression machine learning algorithms are applied.

28
The methodology is similar to predictive algorithms. But instead of building the
assumptions of certain courses of action, personalization mechanisms rearrange the
presentation of the service according to a particular user's “state of things.”
Here’s how it works.
While using the service, users are leaving a detailed history of preferences, actions,
intents, comments, and other stuff. This information presents a user through what
information the user consumes.
The content has certain features that describe its value - topic, category, type, color,
weight, time of publication - the list goes on.
The values of different pieces of content create a grid of relations between different
types of content.
Via machine learning - user information and content information are compared and
matched together.
The result of this operation is an assumption of what else the user might like in case if he
likes or dislikes a particular piece of content. It is another grid of relations that overlaps
with the content.
As a result, the user gets the service that is designed according to his preferences.
applications-of-machine-learning
Personalization is used in the following fields:
Content personalization is arranging the news feed according to user preferences.
Product suggestions appear based on the user preferences combined with items from
similar groups (for example, products that go well combined - like socks and sneakers).
Ads personalization uses a dual inventory of the content. You have the website’s content
inventory, and you have ad inventory. The dual inventory of the content enables the
presentation of relevant ads throughout the session.
Want to estimate the cost of your project?
Use a Free Project Cost Calculator

29
Natural Language Processing - Text Generation, Text Analysis, Text Translation,
Chatbots
Natural Language Processing machine learning algorithms get into the nitty-gritty of the
words and extract the stuff of value out of it. And since the text is a raw state of data - it
is applied in one form or another practically everywhere.

NLP applies a broad scope of machine learning algorithms to enable its operation.

Clustering algorithms are used to explore texts


Classification algorithms are used to analyze its features
Classification and clustering involve parsing, segmentation, and tagging to construct a
model upon which further proceedings are handled.
Regression algorithms are used to determine the proper output sequence upon text
generation.
As a result, the algorithm is capable of extracting insights out of the text and producing
the first output.
Business-wise, Natural Language Processing Machine Learning algorithms are used in
the following fields:
General Text analysis. NLP is applied for a wide array of content categorization, topic
discovery, and modeling operations. The use cases include parsing the text for key terms,
studying semantics, and determining the context. (This technique is used by search
engines like Google or DuckDuckGo and also by content marketing tools like
Buzzsumo);
Marketing Content Copywriting Plagiarism detection is similar to text analysis, except it
looks explicitly for anomalies. The text is broken down to critical parts that are checked
for matches elsewhere. Machine learning algorithm performs a comparative analysis of
the text. It can be a direct comparison with a different document or web crawled multi-
source comparative analysis. Copyscape is one of the tools that perform such operations.
Text summarization is used to create news digests, user profiles, banking information,
and research summaries. In this case, NLP clustering and classification machine learning
algorithms are used to explore the text’s semantics and context and determine the critical
points of the text. Then the dimensionality reduction ML algorithm reiterates the text
into a condensed form.
Text generation applies to conversational interfaces, automated reports, and content
generation. At its core, the NLP model feeds on the knowledge base (i.e., its practical
inventory) and uses it as a foundation for the creation of custom texts. The knowledge
base is mapped out by clustering and classification ML algorithms.
Conversational UI, the process involves input analysis and subsequent output generation
(or performing the requested action).
In the case of report generation, the model feeds off the analytics platform and visualizes
in the text form via dimensionality reduction ML algorithm (such an approach can be
seen in Salesforce).
Automated content generation works similarly, except the form of presentation is
adapted to the specific medium. A good example of this is automated emails and Twitter
repost-updates;
Legal / Medical Text translation applies general text analysis via classification and
clustering algorithms and then comparative analysis in a different language to build a
correlation map between languages. Then the entire thing is analyzed through
corresponding reference bases in respective languages. As a result, the text’s context and
semantics are transposed onto another language while retaining its essence. What is

30
translated is the meaning of the words and not the words themselves. The most
prominent example is Google Translate;
General Purpose and field-specific text correction is an expansion of text analysis. Just
like plagiarism checker, text grammar correction applies an anomaly analysis of the text
while referencing to the knowledge base, aka the grammar. The anomalies are then
flagged or corrected. In the case of field-specific text correction, there is also an
additional vocabulary of the relevant terms involved. The most prominent example of
this is Grammarly.
project-management-institute

Sentiment Analysis - Audience Research, Customer Service, Prescription,


Recommendations
Sentiment analysis is the next step in the evolution of data analytics platforms. It deals
more directly with the way customers interact with the product and express opinions
about it.

Sentiment analysis can be used to explore the variety of reactions from the interactions
with different kinds of platforms. To do that, the system uses unsupervised machine
learning on top of a basic recognition procedure.

DATA ANALYTICS TRENDS TO PREPARE FOR A POST-PANDEMIC RESET

Here’s how it works:


The words, in general, have a specific designation. To put it in broad terms, the word
“good” is a positive term, while the word “bad” is negative.”
Then you have specific words that describe the quality
It is one of the primary machine learning tools for the product- or service-based company
that heavily relies on the power of the brand and its public perception.

Sentiment analysis algorithm is designed to get behind the words - into the mood,
opinion, and, most importantly, an intent. This makes it a viable tool in the following
fields:

The use cases for sentiment analysis include:

31
Brand management - the most basic way of using sentiment analysis. Involves a web
scraper framework and an exploratory algorithm that assesses the found mentions and
the context in which the mention was made.
Extended Product Analytics - sentiment analysis is used to explore and analyze the
emotions around the product, including customer feedback, reviews, and also general
brand mentions.
Aspect-based sentiment analysis is used for Audience Research. Sentiment analysis digs
up extra detail about the user’s attitude towards a certain product or theme in general. It
can be used to expand the definition of the audience segments and develop more precise
approaches to them. Such features are now tried out by Salesforce’s Einstein platform
services.
Customer Support is another huge field for sentiment analysis. In this case, the sentiment
analysis and machine learning algorithm helps to navigate between the product’s
knowledge base and customer’s issues. In addition to that, customer support with
sentiment analysis can provide extended analytics regarding user’s recurring issues, their
satisfaction with the service, and general attitude towards the product;
Sentiment Analysis is part of the personalization framework in Recommender engines.
In this case, sentiment analysis is used to bring nuances and elaborate on the suggestions
of the content. The sentiment analysis also boosts the efficiency and variety of opinions.
The best examples of this practice are Netflix’s“you might also like” section and
Amazon’s “people also buy” subcategory.
Look at our detailed review of sentiment analysis use cases to find out more.

ARTIFICIAL INTELLIGENCE IN ECOMMERCE: THE BEST USAGE AND TOOLS

Computer Vision - Image Recognition, Visual Search, Face Recognition


Computer Vision is one of the most exciting fields of machine learning use. If text is a
more or less raw state of data - images require a different approach.

A computer vision algorithm describes image content by matching the features of the
images with the features of available samples. The image is broken down into key
credentials that are used as reference points.
The process looks like this: a photo of a bicycle is recognized as such because the
credentials of the sample photo on which the algorithm is trained and the credentials of
the input photo correlate.
Computer Vision and image recognition, in particular, are widely used throughout
different industries. Let’s round up the most relevant applications:

Visual Search features are widely applied in Search Engines like Google and
eCommerce marketplaces like Amazon and Ali Express. In essence, the visual search
algorithm works similarly to the textual search. The image is broken down by ML
algorithms to key credentials that are compared with the credentials of the sample base.
In addition to that, the NLP algorithm processes image metadata and other textual input
(such as the context in which the image is placed).
Face Detection is one of the cornerstones of social networks like Facebook and
Instagram. The methodology behind face recognition is similar to visual search, except
the process consists of two aspects:
First, you have general image recognition of the shape of the face.

32
Then classification algorithm matches the credentials of the available user base and finds
the person whose appearance matches with the one on the particular photograph.
Optical character recognition, aka OCR, is another big use case. The principle mechanics
of OCR are the same as in general image recognition. The difference is that the
algorithm is trained on textual content and its correlation with the visual presentation of
the text via fonts, sizes, formatting, and color.
Handwriting and Fingerprint recognition - this type of computer vision combines the
Face Recognition framework with OCR mechanics. Just like every face, handwriting
(especially signatures) and fingerprints are unique to each person. The algorithm can
recognize specific features and later use it for verification purposes. This technique is
widely used in banking, where signatures are important as a verification of the
operations. In the case of fingerprints - this type of recognition is widely used in security
as an extreme.
The key algorithms behind computer vision are a combination of unsupervised and
supervised machine learning algorithms.

First clustering algorithm explores features of the sample dataset and subsequently
classifies it.
The different samples of the object provide variables to the description of the feature and
increase the accuracy of the recognition.
Then the unsupervised clustering algorithm is used to explore an input image.
After that supervised classification algorithm kicks in and matches the features and thus
performs recognition of the image.
TOP AI USE CASES IN SUPPLY CHAIN OPTIMIZATION

Machine Learning Speech Recognition - Ai Assistants, Speech-To-Text, Automatic


Subtitling
Speech recognition is something of a frontier these days. In a way, the technology is
similar to computer vision; it just took more time to figure out how to analyze sound
productively. With the emergence of conversational interfaces and mass adoption of
virtual assistants - speech recognition turned into a viable business opportunity.

The major fields where speech recognition is applied include:

AI Assistants / Personal Assistant apps uses natural language processing to recognize an


input query, perform it and/or compose the output message. In addition to that, the
assistant has a database of sound samples to perform the message. An excellent example
of this is Google Assistant, Alexa, or Siri;
Sound-based Diagnosis is very similar to image recognition. The sound is broken down
to credentials, which are then matched. There is a comparative database of sounds to
detect anomalies and suggest a possible cause. This technology is used in healthcare and
automobile industries to examine the patient or product and determine the root of the
problem more efficiently.
Text-to-speech / speech-to-text automatic captioning - this is basic speech recognition
application. The technology is commonly used by specialized tech-to-speech services
(like Transcribe) and also virtual assistants (Alexa, Siri, Cortana, et al.) In addition to
that, this type of recognition can be used to augment audio-visual content with captions
(for example, YouTube or Facebook automatic subtitling features).

33
2.5 Demystifying Machine Learning

Machine Learning”. Now that’s a word that packs a punch! Machine learning is hot stuff
these days! And why won’t it be? Almost every “enticing” new development in the field
of Computer Science and Software Development in general has something related to
machine learning behind the veils. Microsoft’s Cortana – Machine Learning. Object and
Face Recognition – Machine Learning and Computer Vision. Advanced UX improvement
programs – Machine Learning (yes!. The Amazon product recommendation you just got
was the number crunching effort of some Machine Learning Algorithm).
And not even just that. Machine Learning and Data Science in general is EVERYWHERE.
It is as omnipotent as God himself, had he been into Computers! Why? Because Data is
everywhere!
So it is natural, that anyone who has above average brains and can differentiate between
Programming Paradigms by taking a sneak-peek at Code, is intrigued by Machine
Learning.
But what is Machine Learning? And how big is Machine Learning? Let’s demystify
Machine Learning, once and for all. And to do that, rather than presenting technical
specifications, we’ll follow a “Understand by Example” approach.
Machine Learning : What is it really?
Well, Machine Learning is a subfield of Artificial Intelligence which evolved from Pattern
Recognition and Computational Learning theory. Arthur Lee Samuel defines Machine
Learning as: Field of study that gives computers the ability to learn without being
explicitly programmed.
So, basically, the field of Computer Science and Artificial intelligence that “learns” from
data without human intervention.
But this view has a flaw. As a result of this perception, whenever the word Machine
Learning is thrown around, people usually think of “A.I.” and “Neural Networks that can
mimic Human brains ( as of now, that is not possible)”, Self Driving Cars and what not.
But Machine Learning is far beyond that. Below we uncover some expected and some
generally not expected facets of Modern Computing where Machine Learning is in action.
Machine Learning: The Expected
We’ll start with some places where you might expect Machine Learning to play a part.
1. Speech Recognition (Natural Language Processing in more technical
terms) : You talk to Cortana on Windows Devices. But how does it
understand what you say? Along comes the field of Natural Language
Processing, or N.L.P. It deals with the study of interactions between Machines
and Humans, via Linguistics. Guess what is at the heart of NLP: Machine
Learning Algorithms and Systems ( Hidden Markov Models being one).
1. Computer Vision : Computer Vision is a subfield of AI which deals with a
Machine’s (probable) interpretation of the Real World. In other words, all
Facial Recognition, Pattern Recognition, Character Recognition Techniques
belong to Computer Vision. And Machine Learning once again, with it wide
range of Algorithms, is at the heart of Computer Vision.

34
1. Google’s Self Driving Car : Well. You can imagine what drives it actually.
More Machine Learning goodness.
But these were expected applications. Even a naysayer would have a good insight about
these feats of technology being brought to life by some “mystical (and extremely hard)
mind crunching Computer wizardry”.
Machine Learning : The Unexpected
Let’s visit some places normal folks would not really associate easily with Machine
Learning:
Amazon’s Product Recommendations: Ever wondered how Amazon always has a
recommendation that just tempts you to lighten your wallet. Well, that’s a Machine
Learning Algorithm(s) called “Recommender Systems” working in the backdrop. It
learns every user’s personal preferences and makes recommendations according to that.
Youtube/Netflix : They work just as above!
Data Mining / Big Data : This might not be so much of a shock to many. But Data
Mining and Big Data are just manifestations of studying and learning from data at a
larger scale. And wherever there’s the objective of extracting information from data,
you’ll find Machine Learning lurking nearby.
Stock Market/Housing Finance/Real Estate : All of these fields, incorporate a lot of
Machine Learning systems in order to better assess the market, namely “Regression
Techniques”, for things as mediocre as predicting the price of a House, to predicting and
analyzing stock market trends.
You hear people talking about machine learning. But are you sure what is the truth and
what’s a myth? People are curious to know about machine learning and artificial
intelligence but face a lot of confusions while getting started.

Machine learning was first used by multinational organizations such as Facebook,


Google, and Amazon. Google used it for advertisement placement while Facebook used
Machine learning for showing the post feeds. However, there are a lot of myths about
machine learning and its impact. Let’s get started with a few of them.

1. Machine Learning Can Be Used Anywhere

This is one common myth that machine learning can be used anywhere. Nobody will
spend Rs. 1,000 on the work worth Rs. 200 rupees. Machine learning is used only if you
have Big Data sets. It is not worth to use machine learning for small data solutions as
that can be done by a human effortlessly.

2. There’s No Difference Between Artificial Intelligence And Machine Learning

Most often we use machine learning and artificial intelligence terms interchangeably.
However, both are not the same and not synonymous with each other. Robotics,
computer vision, and natural language processes are areas under the artificial intelligence
stream. Machine learning is learning about patterns, using statistics and data predictions.

3. Deep Learning Is A Machine Learning

Today, deep learning is a widely used term in the market. People are assuming that it is
the ultimate solution for data science and machine learning problem. Deep learning is

35
one of the complex and challenging concepts in machine learning. Deep learning is a
subset of machine learning with multi-layer neural networks computation.

4. Machine Learning Platform Is Easy To Build, And Anyone Can Do It

Many people think that you can just Google about machine learning and easily build any
platform. However, machine learning is a special technique that demands the expertise
skill set. To learn machine learning, you should know how to prepare the data for testing
and training, how to demarcate data, how to build an exact algorithm and very important,
you should know about the productive system. To get expertise in machine learning, one
should have hands-on experience with machine learning patterns and algorithms.

5. Machine Learning Can Work Independently Without Human Intervention

People have a belief that the machine learns the system without real programming codes.
However, the algorithms for machine learning solutions are developed by humans. So
human intervention is a mandatory part of machine learning; it can’t be ruled out
entirely.

6. Machine Learning Is The Same As Data Mining

Data mining is nothing but identifying the unknown patterns and properties of data. On
the other side, machine learning is to use existing patterns and properties for developing
a solution. There is a fine line between data mining and machine learning.

7. Machine Learning Is The Future


In the future, no doubt machine learning can be used widely, but this is not the only
future. There are more advanced technologies in the market that can be one step above
machine learning. Self-driven car or robots have just been an imagination a few years
back. However today it is a reality.

8. Machine Learning Will Take Over Human Work


It is one of the major fears that machine learning will replace humans. Though machine
learning will automate the system and perform social activities to some extent, it will
also create new job roles or develop a need for a new skill set. Machine learning will
create more significant opportunities for new skills and creative thinking.

9. Machine Learning Is Prone To Failure


Machine learning works on the algorithms; so the success of a solution mainly depends
on the algorithm. A successful machine learning solution requires the right algorithm.
Every problem and situation demands different solutions. Therefore, the wrong
algorithm leads to failure of the entire solution. You should have a clear plan for the
algorithm.

10. Machine Learning Is Not Useful In Relationships


People believe that machine learning is useful only for identifying correlations, not
relationships. Machine learning algorithms can be used for discovering relationships as
well as recognize knowledge. Machine learning can read entire data and derive the links
based on past data.

36
2.6 Decision Tree

Decision Tree : Decision tree is the most powerful and popular tool for classification
and prediction. A Decision tree is a flowchart like tree structure, where each internal
node denotes a test on an attribute, each branch represents an outcome of the test, and
each leaf node (terminal node) holds a class label.

A decision tree for the concept PlayTennis.


Construction of Decision Tree :
A tree can be “learned” by splitting the source set into subsets based on an attribute
value test. This process is repeated on each derived subset in a recursive manner
called recursive partitioning. The recursion is completed when the subset at a node all
has the same value of the target variable, or when splitting no longer adds value to the
predictions. The construction of decision tree classifier does not require any domain
knowledge or parameter setting, and therefore is appropriate for exploratory knowledge
discovery. Decision trees can handle high dimensional data. In general decision tree
classifier has good accuracy. Decision tree induction is a typical inductive approach to
learn knowledge on classification.

Decision Tree Representation:


Decision trees classify instances by sorting them down the tree from the root to some

37
leaf node, which provides the classification of the instance. An instance is classified by
starting at the root node of the tree, testing the attribute specified by this node, then
moving down the tree branch corresponding to the value of the attribute as shown in the
above figure. This process is then repeated for the subtree rooted at the new node.
The decision tree in above figure classifies a particular morning according to whether it
is suitable for playing tennis and returning the classification associated with the
particular leaf.(in this case Yes or No).
For example, the instance

(Outlook = Rain, Temperature = Hot, Humidity = High, Wind = Strong )

would be sorted down the leftmost branch of this decision tree and would therefore be
classified as a negative instance.
In other words we can say that decision tree represent a disjunction of conjunctions of
constraints on the attribute values of instances.

(Outlook = Sunny ^ Humidity = Normal) v (Outlook = Overcast) v (Outlook = Rain ^


Wind = Weak)

Strengths and Weakness of Decision Tree approach


The strengths of decision tree methods are:

• Decision trees are able to generate understandable rules.


• Decision trees perform classification without requiring much computation.
• Decision trees are able to handle both continuous and categorical variables.
• Decision trees provide a clear indication of which fields are most important
for prediction or classification.
The weaknesses of decision tree methods :

• Decision trees are less appropriate for estimation tasks where the goal is to
predict the value of a continuous attribute.
• Decision trees are prone to errors in classification problems with many class
and relatively small number of training examples.
• Decision tree can be computationally expensive to train. The process of
growing a decision tree is computationally expensive. At each node, each
candidate splitting field must be sorted before its best split can be found. In
some algorithms, combinations of fields are used and a search must be made
for optimal combining weights. Pruning algorithms can also be expensive
since many candidate sub-trees must be formed and compared.
A tree has many analogies in real life, and turns out that it has influenced a wide area of
machine learning, covering both classification and regression. In decision analysis, a
decision tree can be used to visually and explicitly represent decisions and decision
making. As the name goes, it uses a tree-like model of decisions. Though a commonly
used tool in data mining for deriving a strategy to reach a particular goal, its also widely
used in machine learning, which will be the main focus of this article.

38
How can an algorithm be represented as a tree?
For this let’s consider a very basic example that uses titanic data set for predicting
whether a passenger will survive or not. Below model uses 3 features/attributes/columns
from the data set, namely sex, age and sibsp (number of spouses or children along).

A decision tree is drawn upside


down with its root at the top. In the image on the left, the bold text in black represents a
condition/internal node, based on which the tree splits into branches/ edges. The end of
the branch that doesn’t split anymore is the decision/leaf, in this case, whether the
passenger died or survived, represented as red and green text respectively.

Although, a real dataset will have a lot more features and this will just be a branch in a
much bigger tree, but you can’t ignore the simplicity of this algorithm. The feature
importance is clear and relations can be viewed easily. This methodology is more
commonly known as learning decision tree from data and above tree is called
Classification tree as the target is to classify passenger as survived or died. Regression
trees are represented in the same manner, just they predict continuous values like price of
a house. In general, Decision Tree algorithms are referred to as CART or Classification
and Regression Trees.

So, what is actually going on in the background? Growing a tree involves deciding on
which features to choose and what conditions to use for splitting, along with knowing
when to stop. As a tree generally grows arbitrarily, you will need to trim it down for it to
look beautiful. Lets start with a common technique used for splitting.

39
Recursive Binary Splitting

In this procedure all the features are considered and different split points are tried and
tested using a cost function. The split with the best cost (or lowest cost) is selected.

Consider the earlier example of tree learned from titanic dataset. In the first split or the
root, all attributes/features are considered and the training data is divided into groups
based on this split. We have 3 features, so will have 3 candidate splits. Now we will
calculate how much accuracy each split will cost us, using a function. The split that costs
least is chosen, which in our example is sex of the passenger. This algorithm is recursive
in nature as the groups formed can be sub-divided using same strategy. Due to this
procedure, this algorithm is also known as the greedy algorithm, as we have an excessive
desire of lowering the cost. This makes the root node as best predictor/classifier.

Cost of a split
Lets take a closer look at cost functions used for classification and regression. In both
cases the cost functions try to find most homogeneous branches, or branches having
groups with similar responses. This makes sense we can be more sure that a test data
input will follow a certain path.

Regression : sum(y — prediction)²

Lets say, we are predicting the price of houses. Now the decision tree will start splitting
by considering each feature in training data. The mean of responses of the training data
inputs of particular group is considered as prediction for that group. The above function
is applied to all data points and cost is calculated for all candidate splits. Again the split
with lowest cost is chosen. Another cost function involves reduction of standard
deviation, more about it can be found here.

Classification : G = sum(pk * (1 — pk))

A Gini score gives an idea of how good a split is by how mixed the response classes are
in the groups created by the split. Here, pk is proportion of same class inputs present in a
particular group. A perfect class purity occurs when a group contains all inputs from the
same class, in which case pk is either 1 or 0 and G = 0, where as a node having a 50–50
split of classes in a group has the worst purity, so for a binary classification it will have
pk = 0.5 and G = 0.5.

When to stop splitting?

40
You might ask when to stop growing a tree? As a problem usually has a large set of
features, it results in large number of split, which in turn gives a huge tree. Such trees are
complex and can lead to overfitting. So, we need to know when to stop? One way of
doing this is to set a minimum number of training inputs to use on each leaf. For example
we can use a minimum of 10 passengers to reach a decision(died or survived), and ignore
any leaf that takes less than 10 passengers. Another way is to set maximum depth of your
model. Maximum depth refers to the the length of the longest path from a root to a leaf.

Pruning
The performance of a tree can be further increased by pruning. It involves removing the
branches that make use of features having low importance. This way, we reduce the
complexity of tree, and thus increasing its predictive power by reducing overfitting.

Pruning can start at either root or the leaves. The simplest method of pruning starts at
leaves and removes each node with most popular class in that leaf, this change is kept if
it doesn't deteriorate accuracy. Its also called reduced error pruning. More sophisticated
pruning methods can be used such as cost complexity pruning where a learning
parameter (alpha) is used to weigh whether nodes can be removed based on the size of
the sub-tree. This is also known as weakest link pruning.

Advantages of CART
Simple to understand, interpret, visualize.
Decision trees implicitly perform variable screening or feature selection.
Can handle both numerical and categorical data. Can also handle multi-output problems.
Decision trees require relatively little effort from users for data preparation.
Nonlinear relationships between parameters do not affect tree performance.
Disadvantages of CART
Decision-tree learners can create over-complex trees that do not generalize the data well.
This is called overfitting.
Decision trees can be unstable because small variations in the data might result in a
completely different tree being generated. This is called variance, which needs to be
lowered by methods like bagging and boosting.
Greedy algorithms cannot guarantee to return the globally optimal decision tree. This can
be mitigated by training multiple trees, where the features and samples are randomly
sampled with replacement.
Decision tree learners create biased trees if some classes dominate. It is therefore
recommended to balance the data set prior to fitting with the decision tree.

41
2.7 Random Forest Regression

Regression is a machine learning technique that is used to predict values across a certain
range. Let us see understand this concept with an example, consider the salaries of
employees and their experience in years.

A regression model on this data can help in predicting the salary of an employee even if
that year is not having a corresponding salary in the dataset

Random forest regression is an ensemble learning technique. But what is ensemble learning?

In ensemble learning, you take multiple algorithms or same algorithm multiple times and
put together a model that’s more powerful than the original.

Prediction based on the trees is more accurate because it takes into account many
predictions. This is because of the average value used. These algorithms are more stable
because any changes in dataset can impact one tree but not the forest of trees.

Steps to perform the random forest regression

This is a four step process and our steps are as follows:

1. Pick a random K data points from the training set.


2. Build the decision tree associated to these K data points.
3. Choose the number N tree of trees you want to build and repeat steps 1 and 2.
4. For a new data point, make each one of your Ntree trees predict the value of Y for
the data point in the question, and assign the new data point the average across all
of the predicted Y values.

Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in
ML. It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance of the
model.

As the name suggests, "Random Forest is a classifier that contains a number of


decision trees on various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset." Instead of relying on one decision tree, the
random forest takes the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

42
The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:

Random Forest Algorithm


Note: To better understand the Random Forest Algorithm, you should have knowledge
of the Decision Tree Algorithm.
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class of the dataset, it is
possible that some decision trees may predict the correct output, while others may not.
But together, all the trees predict the correct output. Therefore, below are two
assumptions for a better Random forest classifier:

There should be some actual values in the feature variable of the dataset so that the
classifier can predict accurate results rather than a guessed result.
The predictions from each tree must have very low correlations.
Why use Random Forest?
Below are some points that explain why we should use the Random Forest algorithm:

It takes less training time as compared to other algorithms.


It predicts output with high accuracy, even for the large dataset it runs efficiently.
It can also maintain accuracy when a large proportion of data is missing.
How does Random Forest algorithm work?

43
Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the
new data points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below example:

Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset
is given to the Random forest classifier. The dataset is divided into subsets and given
to each decision tree. During the training phase, each decision tree produces a
prediction result, and when a new data point occurs, then based on the majority of
results, the Random Forest classifier predicts the final decision. Consider the below
image:

44
Random Forest Algorithm
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:

Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
Land Use: We can identify the areas of similar land use by this algorithm.
Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest
Random Forest is capable of performing both Classification and Regression tasks.
It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest
Although random forest can be used for both classification and regression tasks, it is
not more suitable for Regression tasks.

45
2.8 Linear Regression

Linear Regression is a machine learning algorithm based on supervised learning. It


performs a regression task. Regression models a target prediction value based on
independent variables. It is mostly used for finding out the relationship between
variables and forecasting. Different regression models differ based on – the kind of
relationship between dependent and independent variables they are considering, and the
number of independent variables getting used.

Linear regression performs the task to predict a dependent variable value (y) based on a
given independent variable (x). So, this regression technique finds out a linear
relationship between x (input) and y(output). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output) is the salary of a
person. The regression line is the best fit line for our model.
Hypothesis function for Linear Regression :

While training the model we are given :


x: input training data (univariate – one input variable(parameter))
y: labels to data (supervised learning)
When training the model – it fits the best line to predict the value of y for a given value
of x. The model gets the best regression fit line by finding the best θ 1 and θ2 values.
θ1: intercept
θ2: coefficient of x
Once we find the best θ1 and θ2 values, we get the best fit line. So when we are finally
using our model for prediction, it will predict the value of y for the input value of x.

types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

46
Simple Linear Regression:

If a single independent variable is used to predict the value of a numerical dependent


variable, then such a Linear Regression algorithm is called Simple Linear Regression.

Multiple Linear regression:

If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is
called a regression line. A regression line can show two types of relationship:

Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable increases on X-
axis, then such a relationship is termed as a Positive linear relationship.

Linear Regression in Machine Learning

47
Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable increases on the
X-axis, then such a relationship is called a negative linear relationship.

Linear Regression in Machine Learning

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the
error between predicted values and actual values should be minimized. The best fit line will
have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of
regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so
to calculate this we use cost function.

Cost function-

48
The different values for weights or coefficient of lines (a0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the best
fit line.

Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.

We can use the cost function to find the accuracy of the mapping function, which maps the
input variable to the output variable. This mapping function is also known as Hypothesis
function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the
average of squared error occurred between the predicted values and actual values. It can be
written as:

For the above linear equation, MSE can be calculated as:

Linear Regression in Machine Learning

Where,

N=Total number of observation

Yi = Actual value

(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called residual. If
the observed points are far from the regression line, then the residual will be high, and so
cost function will high. If the scatter points are close to the regression line, then the residual
will be small and hence the cost function.

49
Gradient Descent:

Gradient descent is used to minimize the MSE by calculating the gradient of the cost
function.

A regression model uses gradient descent to update the coefficients of the line by reducing
the cost function.

It is done by a random selection of values of coefficient and then iteratively update the values
to reach the minimum cost function.

Model Performance:

The Goodness of fit determines how the line of regression fits the set of observations. The
process of finding the best model out of various models is called optimization. It can be
achieved by below method:

1. R-squared method:

R-squared is a statistical method that determines the goodness of fit.

It measures the strength of the relationship between the dependent and independent variables
on a scale of 0-100%.

The high value of R-square determines the less difference between the predicted values and
actual values and hence represents a good model.

It is also called a coefficient of determination, or coefficient of multiple determination for


multiple regression.

It can be calculated from the below formula:

50
Linear Regression in Machine Learning

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are some formal checks
while building a Linear Regression model, which ensures to get the best possible result from
the given dataset.

Linear relationship between the features and target:

Linear regression assumes the linear relationship between the dependent and independent
variables.

Small or no multicollinearity between the features:

Multicollinearity means high-correlation between the independent variables. Due to


multicollinearity, it may difficult to find the true relationship between the predictors and
target variables. Or we can say, it is difficult to determine which predictor variable is
affecting the target variable and which is not. So, the model assumes either little or no
multicollinearity between the features or independent variables.

Homoscedasticity Assumption:

Homoscedasticity is a situation when the error term is the same for all the values of
independent variables. With homoscedasticity, there should be no clear pattern distribution
of data in the scatter plot.

Normal distribution of error terms:

Linear regression assumes that the error term should follow the normal distribution pattern.
If error terms are not normally distributed, then confidence intervals will become either too
wide or too narrow, which may cause difficulties in finding coefficients.

51
It can be checked using the q-q plot. If the plot shows a straight line without any deviation,
which means the error is normally distributed.

No autocorrelations:

The linear regression model assumes no autocorrelation in error terms. If there will be any
correlation in the error term, then it will drastically reduce the accuracy of the model.
Autocorrelation usually occurs if there is a dependency between residual errors.

2.9 Extra-trees regressor

This class implements a meta estimator that fits a number of randomized decision trees
(a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the
predictive accuracy and control over-fitting.

Extra-trees differ from classic decision trees in the way they are built. When looking for the
best split to separate the samples of a node into two groups, random splits are drawn for each
of the max_features randomly selected features and the best split among those is chosen.
When max_features is set 1, this amounts to building a totally random decision tree.

Extra Trees vs Random Forest

Therefore, trees ensemble methods are better than simple decision trees, but is an ensemble
better than the other? Which one should I use? This post is going to try to answer these
questions, studying the differences between the Extra Trees and Random Forest and
comparing both of them in terms of results.

The two ensembles have a lot in common. Both of them are composed of a large number of
decision trees, where the final decision is obtained taking into account the prediction of
every tree. Specifically, by majority vote in classification problems, and by the arithmetic

52
mean in regression problems. Furthermore, both algorithms have the same growing tree
procedure (with one exception explained below). Moreover, when selecting the partition of
each node, both of them randomly choose a subset of features.

So, the main two differences are the following:

Random forest uses bootstrap replicas, that is to say, it subsamples the input data with
replacement, whereas Extra Trees use the whole original sample. In the Extra Trees sklearn
implementation there is an optional parameter that allows users to bootstrap replicas, but by
default, it uses the entire input sample. This may increase variance because bootstrapping
makes it more diversified.Another difference is the selection of cut points in order to split
nodes. Random Forest chooses the optimum split while Extra Trees chooses it randomly.
However, once the split points are selected, the two algorithms choose the best one between
all the subset of features. Therefore, Extra Trees adds randomization but still has
optimization.These differences motivate the reduction of both bias and variance. On one
hand, using the whole original sample instead of a bootstrap replica will reduce bias. On the
other hand, choosing randomly the split point of each node will reduce variance.

In terms of computational cost, and therefore execution time, the Extra Trees algorithm is
faster. This algorithm saves time because the whole procedure is the same, but it randomly
chooses the split point and does not calculate the optimal one.

From these reasons comes the name of Extra Trees (Extremely Randomized Trees)

Comparison :: In this section, we are going to compare the performance of the two
algorithms. For that purpose, we are going to analyze the obtained results when using
different datasets.

In order to generate these datasets, the function make_classification of the sklearn python
library has been used. It has been chosen to use synthetic datasets instead of real ones

53
because, in this way, the properties of the dataset can be tuned. Furthermore, this function
allows us to select the number of features and their characteristics.

Therefore, this gives us the opportunity of analyzing the performance of the two algorithms
in a classification problem, using datasets with different properties. In total, five datasets
have been generated in which all of them have 1000 samples, 20 features, and 2 classes
(binary classification). Moreover, 30% of the samples are assigned randomly. In this way,
introducing noise make the classification harder. Moreover, the chosen differences between
the features of the datasets are the following

Characteristics of the features of each dataset. Informative, repeated, redundant or random


features.

So, we can analyze if the two algorithms behave differently when having redundant, noisy,
and repeated features.

Besides, the sklearn implementation of the two algorithms has been used. In addition, as for
the parameters, the same number of estimators has been chosen, 100, leaving the rest of the
parameters by default. Furthermore, cross-validation has been applied with K=3.

Results

The following table summarizes the obtained results comparing the F-Score and the
execution time:Comparison of the obtained results for the different datasets, using Extra
Trees and Random Forest.From the above results, we can see that results barely vary for the
two algorithms. It seems that the Random Forest gets better results, but the difference is very
small. As mentioned above, the variance of the Extra Trees is lower, but again, the difference
is practically insignificant. It is remarkable that results vary more for datasets 4 and 5 where
most of the features are redundant or repeated.Moreover, as expected, Extra Trees is much
faster. This is because instead of looking for the optimal split at each node it does it
randomly.

54
3. PROPOSED SYSTEM

3.1 SYSTEM OVERVIEW

Fig. 1: High-level overview of the presented intelligent system.

In this model, a five-step procedure is used to solve the problem of predicting the Sales
revenue for different products at different outlet locations for Big Mart Companies. First,
the data is acquired, collected and divided into training and test label. This data undergoes a
preliminary analysis which includes univariate and bivariate analysis. In the second stage,
data pre-processing is performed which takes care of missing and erroneous values in the
dataset. In the third stage, the features are selected and modified to get the best results. In
the fourth stage, feature transformation is used to convert categorical features to numerical
features. In the fifth stage, using various algorithm techniques models are built and the
results are evaluated. These results are communicated to the firm and finally, after approval
the results are applied by the firm to generate a business model for next year. Using this
method guarantees more accurate and better results. III. FEATURE EXTRACTION
Correlation and Data mining is used for feature selection over here. Feature extraction is the
technique of extracting features of the data like its mean, variance, standard deviation,
entropy etc. A. Motivation After collecting and integrating the dataset that is acquired, the
nature of data needs to be understood. For doing so correlations, general trends and outliers
need to be identified by calculating the mathematical statistics (mean, median, range,
standard deviation, etc.) for each feature. By doing so the features that affect the sales of a

55
product can be shortlisted. Visualization techniques are also used to support the needed.
After the preliminary analysis missing, duplicate and inconsistent values are checked for
and corrected. The features that can be grouped together or those that are not needed are
filtered. Dimension analysis further enhances the feature selection approach. If this step is
not performed when a lot of unrequired features would be analysed in the further steps and
might produce a major difference in the result obtained. This makes working with the dataset
easier and faults tolerant. B. Data Cleaning After understanding the nature of the data and
finding a correlation between different features and target variable i.e. sales. The erroneous
values in the dataset need to be replaced with values that make sense, the missing values
need to be replaced with appropriate numerical or categorical value depending on the type
of feature. The redundant information in the dataset is to be removed. This fills the gaps
within the dataset and makes it wholesome, which enables better results. C. Feature
Transformation Data cleansing gives us a wholesome error-free dataset to work with,
Feature Transformation is the family of algorithms used to create new features from existing
features, in this we use a International Journal of Scientific Research and Engineering
Development-– Volume2 Issue 2, Mar –Apr 2019 Available at www.ijsred.com linear
combination of two or more features to make a new feature, this new feature gives better
results with respect to target variable i.e. sales. This system also uses categorical feature
transformation to numerical feature transformation. The redundant features are dropped
from the dataset for the new ones.

3.2 SYSTEM ARCHITECTURE

A system architecture is the conceptual model that defines the structure, behavior, and more
views of a system. An architecture description is a formal description and representation of
a system, organized in a way that supports reasoning about the structures and behaviors of
the system

56
3.3 PROPOSED MODEL

Model is a pictorial or graphic representation of key concepts. it shows , (with the help of
arrows and other diagrams ),the relationship between various types of variables e.g.
independent , dependent, moderating, mediating variables etc

57
3.4 PHASES IN MODEL

1. DATA EXPLORATION In this phase useful information about the data has
been extracted from the dataset. That is trying to identify the information
from hypotheses vs available data. Which shows that the attributes Outlet
size and Item weight face the problem of missing values, also the minimum
value of Item Visibility is zero which is not actually practically possible.
Establishment year of Outlet varies from 1985 to 2009. These values may not
be appropriate in this form. So, we need to convert them into how old a
particular outlet is. There are 1559 unique products, as well as 10 unique
outlets, present in the dataset. The attribute Item type contains 16 unique
values. Whereas two types of Item Fat Content are there but some of them
are misspelled as regular instead of ’Regular’ and low fat, LF instead of Low
Fat. The response variable i.e. Item Outlet Sales was positively skewed. So,
to remove the skewness of response variable a log operation was performed
on Item Outlet Sales.
• Today it’s easier than ever for businesses to collect and store data about every part
of their operation. However, the challenge facing business leaders is understanding
the implications and opportunities hidden within that data quickly.

• Data exploration is a vital process in data science. Analysts investigate a dataset to
illuminate specific patterns or characteristics to help companies or organizations
understand insights and implement new policies.

58

• While data exploration doesn’t necessarily reveal every minute detail, it helps form
a broader picture of specific trends or areas to study. Using manual methods and
automated tools, users explore data to determine which model or algorithm is best
for subsequent steps in data analysis.

• Manual data exploration techniques can help users identify specific areas of interest,
which is workable yet falls short of deeper investigation. This is where machine
learning can take your data analysis to the next level.

• Machine learning algorithms or automated exploration software can easily identify
relationships between various data variables and dataset structures to determine
whether outliers exist, and create data values that can highlight patterns or points of
interest.

• Both data exploration and machine learning can identify notable patterns and help
draw conclusions from datasets. But machine learning allows users to extract
information in large databases quickly and with little room for error.

• With more data available than ever before, many companies are faced with an
abundance of data but not enough resources to analyze and process it. This is where
machine learning comes in.

• What are the advantages of data exploration in machine learning?
• Using machine learning for exploratory data analysis helps data scientists monitor
their data sources and explore data for large analyses. While manual data exploration
can be useful for homing in on specific datasets of interest, machine learning offers
a much wider lens, offering actionable insights that can transform your company’s
understanding of patterns and trends.

• Machine learning software can also make your data far easier to digest. By taking
data points and exporting them to data visualization displays such as bar charts or
scatter plots, companies can extract meaningful information at a glance without
spending time interpreting and questioning results.

• When you begin to explore your data with automated data exploration tools, you can
come away with in-depth insights that lead to better decisions. Today’s machine
learning solutions include open-source tools with regression capabilities and
visualization methods using programming languages such as Python for data
preparation.

• Data exploration through machine learning
• Data exploration has two primary goals: To highlight traits of single variables, and
reveal patterns and relationships between variables.

59
• When using machine learning for data exploration, data scientists start by identifying
metrics or variables, running an univariate analysis and bivariate analysis, and
conducting a missing values treatment

2. DATA CLEANING It was observed from the previous section that the attributes Outlet
Size and Item Weight has missing values. In our work in case of Outlet Size missing value
we replace it by the mode of that attribute and for the Item Weight missing values we replace
by mean of that particular attribute. The missing attributes are numerical where the
replacement by mean and mode diminishes the correlation among imputed attributes. For
our model we are assuming that there is no relationship between the measured attribute and
imputed attribute.

Another key step includes identifying outliers, and finally, variable transformation and
variable creation. Let’s review these steps in more detail:

Identifying variables

60
To get started, data scientists will identify the factors that change or could potentially
change. Then, scientists will identify the data type and category of the variables.

Univariate and bivariate analysis

Each variable is then explored individually with box plots or histograms to determine
whether it is categorical or continuous, a process known as the univariate analysis. This
process can also highlight missing data and outlier values. Next, a bivariate analysis will
help determine the relationship between variables.

Missing values

It’s not uncommon for datasets to have missing values or missing data. Identifying gaps in
information improves the overall accuracy of your data analysis.

Identifying outliers

Another common element in datasets is the presence of outliers. Outliers in data refer to
observations that are divergent from a generalized pattern in a data sample. Outliers can
skew data considerably, and should be highlighted and addressed before extracting insights.

2. FEATURE ENGINEERING Some nuances were observed in the data-set during


data exploration phase. So this phase is used in resolving all nuances found from the
dataset and make them ready for building the appropriate model. During this phase
it was noticed that the Item visibility attribute had a zero value, practically which has
no sense. So the mean value item visibility of that product will be used for zero
values attribute. This makes all products likely to sell. All categorical attributes
discrepancies are resolved by modifying all categorical attributes into appropriate
ones. Finally, for determining how old a particular outlet is, we add an additional
attribute Year to the dataset. 5. MODEL BUILDING After completing the previous
phases, the dataset is now ready to build proposed model. Once the model is build it
is used as predictive model to forecast sales of Big Mart. inp:- # Reading modified
data train2 = pd.read_csv("train_modified.csv") test2 =
pd.read_csv("test_modified.csv") inp:-train2.head() In our work, we propose a
model using XG boost Regressor algorithm

61
Feature engineering is a machine learning technique that leverages data to create new
variables that aren’t in the training set. It can produce new features for both
supervised and unsupervised learning, with the goal of simplifying and speeding up
data transformations while also enhancing model accuracy. Feature engineering is
required when working with machine learning models. Regardless of the data or
architecture, a terrible feature will have a direct impact on your model.

Now to understand it in a much easier way, let’s take a simple example. Below are
the prices of properties in x city. It shows the area of the house and total price.

Sample Data
Now this data might have some errors or might be incorrect, not all sources on the
internet are correct. To begin, we’ll add a new column to display the cost per square
foot.

Sample Data
This new feature will help us understand a lot about our data. So, we have a new
column which shows cost per square ft. There are three main ways you can find any
error. You can use Domain Knowledge to contact a property advisor or real estate
agent and show him the per square foot rate. If your counsel states that pricing per
square foot cannot be less than 3400, you may have a problem. The data can be
visualised.

When you plot the data, you’ll notice that one price is significantly different from
the rest. In the visualisation method, you can readily notice the problem. The third
way is to use Statistics to analyze your data and find any problem. Feature
engineering consists of various process -

Feature Creation: Creating features involves creating new variables which will be
most helpful for our model. This can be adding or removing some features. As we
saw above, the cost per sq. ft column was a feature creation.
Transformations: Feature transformation is simply a function that transforms
features from one representation to another. The goal here is to plot and visualise
data, if something is not adding up with the new features we can reduce the number
of features used, speed up training, or increase the accuracy of a certain model.
Feature Extraction: Feature extraction is the process of extracting features from a
data set to identify useful information. Without distorting the original relationships
or significant information, this compresses the amount of data into manageable
quantities for algorithms to process.
Exploratory Data Analysis : Exploratory data analysis (EDA) is a powerful and
simple tool that can be used to improve your understanding of your data, by

62
exploring its properties. The technique is often applied when the goal is to create
new hypotheses or find patterns in the data. It’s often used on large amounts of
qualitative or quantitative data that haven’t been analyzed before.
Benchmark : A Benchmark Model is the most user-friendly, dependable, transparent,
and interpretable model against which you can measure your own. It’s a good idea
to run test datasets to see if your new machine learning model outperforms a
recognised benchmark. These benchmarks are often used as measures for comparing
the performance between different machine learning models like neural networks
and support vector machines, linear and non-linear classifiers, or different
approaches like bagging and boosting. To learn more about feature engineering steps
and process, check the links provided at the end of this article. Now, let’s have a look
at why we need feature engineering in machine learning.
Importance Of Feature Engineering
Feature Engineering is a very important step in machine learning. Feature
engineering refers to the process of designing artificial features into an algorithm.
These artificial features are then used by that algorithm in order to improve its
performance, or in other words reap better results. Data scientists spend most of their
time with data, and it becomes important to make models accurate.

Image By Author
When feature engineering activities are done correctly, the resulting dataset is
optimal and contains all of the important factors that affect the business problem. As
a result of these datasets, the most accurate predictive models and the most useful
insights are produced.

Feature Engineering Techniques for Machine Learning


Lets see a few feature engineering best techniques that you can use. Some of the
techniques listed may work better with certain algorithms or datasets, while others
may be useful in all situations.

1.Imputation

When it comes to preparing your data for machine learning, missing values are one
of the most typical issues. Human errors, data flow interruptions, privacy concerns,
and other factors could all contribute to missing values. Missing values have an
impact on the performance of machine learning models for whatever cause. The main
goal of imputation is to handle these missing values. There are two types of
imputation :

Numerical Imputation: To figure out what numbers should be assigned to people


currently in the population, we usually use data from completed surveys or censuses.
These data sets can include information about how many people eat different types
of food, whether they live in a city or country with a cold climate, and how much

63
they earn every year. That is why numerical imputation is used to fill gaps in surveys
or censuses when certain pieces of information are missing.
#Filling all missing values with 0

data = data.fillna(0)

Categorical Imputation: When dealing with categorical columns, replacing missing


values with the highest value in the column is a smart solution. However, if you
believe the values in the column are evenly distributed and there is no dominating
value, imputing a category like “Other” would be a better choice, as your imputation
is more likely to converge to a random selection in this scenario.
#Max fill function for categorical columns

data[‘column_name’].fillna(data[‘column_name’].value_counts().idxmax(),
inplace=True)

2.Handling Outliers

Outlier handling is a technique for removing outliers from a dataset. This method
can be used on a variety of scales to produce a more accurate data representation.
This has an impact on the model’s performance. Depending on the model, the effect
could be large or minimal; for example, linear regression is particularly susceptible
to outliers. This procedure should be completed prior to model training. The various
methods of handling outliers include:

Removal: Outlier-containing entries are deleted from the distribution. However, if


there are outliers across numerous variables, this strategy may result in a big chunk
of the datasheet being missed.
Replacing values: Alternatively, the outliers could be handled as missing values and
replaced with suitable imputation.
Capping: Using an arbitrary value or a value from a variable distribution to replace
the maximum and minimum values.
Discretization : Discretization is the process of converting continuous variables,
models, and functions into discrete ones. This is accomplished by constructing a
series of continuous intervals (or bins) that span the range of our desired
variable/model/function.
3.Log Transform

Log Transform is the most used technique among data scientists. It’s mostly used to
turn a skewed distribution into a normal or less-skewed distribution. We take the log
of the values in a column and utilise those values as the column in this transform. It
is used to handle confusing data, and the data becomes more approximative to normal
applications.

64
//Log Example

df[log_price] = np.log(df[‘Price’])

4.One-hot encoding

A one-hot encoding is a type of encoding in which an element of a finite set is


represented by the index in that set, where only one element has its index set to “1”
and all other elements are assigned indices within the range [0, n-1]. In contrast to
binary encoding schemes, where each bit can represent 2 values (i.e. 0 and 1), this
scheme assigns a unique value for each possible case.

5.Scaling

Feature scaling is one of the most pervasive and difficult problems in machine
learning, yet it’s one of the most important things to get right. In order to train a
predictive model, we need data with a known set of features that needs to be scaled
up or down as appropriate. This blog post will explain how feature scaling works
and why it’s important as well as some tips for getting started with feature scaling.

After a scaling operation, the continuous features become similar in terms of range.
Although this step isn’t required for many algorithms, it’s still a good idea to do so.
Distance-based algorithms like k-NN and k-Means, on the other hand, require scaled
continuous features as model input. There are two common ways for scaling :

Normalization : All values are scaled in a specified range between 0 and 1 via
normalisation (or min-max normalisation). This modification has no influence on the
feature’s distribution, however it does exacerbate the effects of outliers due to lower
standard deviations. As a result, it is advised that outliers be dealt with prior to
normalisation.

Standardization: Standardization (also known as z-score normalisation) is the


process of scaling values while accounting for standard deviation. If the standard
deviation of features differs, the range of those features will likewise differ. The
effect of outliers in the characteristics is reduced as a result. To arrive at a distribution
with a 0 mean and 1 variance, all the data points are subtracted by their mean and
the result divided by the distribution’s variance.

65
3.5 SYSTEM REQUIREMENT SPECIFICATION

Software requirements specification establishes the basis for an agreement between


customers and contractors or suppliers on how the software product should function (in a
market-driven project, these roles may be played by the marketing and development
divisions). Software requirements specification is a rigorous assessment of requirements
before the more specific system design stages, and its goal is to reduce later redesign. It
should also provide a realistic basis for estimating product costs, risks, and schedules. Used
appropriately, software requirements specifications can help prevent software project
failure.

3.6.1 HARDWARE REQUIREMENTS

The hardware requirements may serve as the basis for a contract for the implementation of
the system and should therefore be a complete and consistent specification of the whole
system. To ensure smooth execution of the Expert System, a computer with the following
hardware is recommended.

• Processor: 1 gigahertz (GHz) or faster processor


• RAM:4 gigabyte (GB)
• Hard disk space: 250 GB
• Display: 1366x768 pixels Resolution
• Peripherals: Standard Keyboard and Mouse.

66
3.6.2 SOFTWARE REQUIREMENTS

The software requirements is the specification of the system. It should include both a
definition and a specification of requirements. It is a set of what the system should do rather
than how it should do it. It is useful in estimating cost, planning team activities, performing
tasks and tracking the teams and tracking the team’s progress throughout the development
activity.

• Operating System: WINDOWS 7 or Newer.


• IDE: jupiter notebook

67
4.IMPLEMENTATION AND CODING

4.1 Exploratory Data Analysis

What is exploratory data analysis in machine learning?

Exploratory Data Analysis (EDA) is an approach to analyze the data using visual techniques.
It is used to discover trends, patterns, or to check assumptions with the help of statistical
summary and graphical representations.

Fig : Density based of item weight

68
Fig: Density based on item visibility

Fig : density based on item MRP

Fig : Count of item fat content

69
Fig : Count of outlet establishment year

Fig : Count of outlet size

70
Fig : Count of outlet locatoin type

Fig : Count of outlet type

71
Coorelation Matrix :

Correlation is an indication about the changes between two variables. In our previous
chapters, we have discussed Pearson’s Correlation coefficients and the importance of
Correlation too. We can plot correlation matrix to show which variable is having a high or
low correlation in respect to another variable.

Fig: Heatmap between correlation between attributes

the correlation among various dependent and independent variables is explored to be able to
decide on the further steps that are to be taken. Variables used are obtained after data pre-
processing, and following are some of the important observations about some of the used
variables

• Item_visibility is having nearly zero correlation with our dependent variable


item_outlet_sales and grocerystore outlet_type. This means that the sales are not affected by
visibility of item which is a contradiction to the general assumption of “more visibility thus,
more sales”.

72
• Item_MRP (maximum retail price) is positively correlated with sales at an outlet, which
indicates that the price quoted by an outlet plays an important factor in sales.

• Outlet situated in location with type tier 2 and size medium are also having high sales,
which means that a one-stop-shopping-center situated in a town or city with populated area
can have high sales.

• Variation in MRP quoted by various outlets depends on their individual sales.

Methodology The steps followed in this work, right from the dataset preparation to obtaining
results are represented in Fig.1.

4.2 DATA PREPROCESSING

Data preprocessing in Machine Learning is a crucial step that helps enhance the quality of
data to promote the extraction of meaningful insights from the data. Data preprocessing in
Machine Learning refers to the technique of preparing (cleaning and organizing) the raw
data to make it suitable for a building and training Machine Learning models. In simple
words, data preprocessing in Machine Learning is a data mining technique that transforms
raw data into an understandable and readable format.

When it comes to creating a Machine Learning model, data preprocessing is the first step
marking the initiation of the process. Typically, real-world data is incomplete, inconsistent,
inaccurate (contains errors or outliers), and often lacks specific attribute values/trends. This
is where data preprocessing enters the scenario – it helps to clean, format, and organize the

73
raw data, thereby making it ready-to-go for Machine Learning models. Let’s explore various
steps of data preprocessing in machine learning.

Dataset and its Preprocessing Big Mart’s data scientists collected sales data of their 10
stores situated at different locations with each store having 1559 different products as
per 2013 data collection. Using all the observations it is inferred what role certain
properties of an item play and how they affect their sales. The dataset looks like shown
in Fig.2 on using head() function on the dataset variable.

The data set consists of various data types from integer to float to object as shown in Fig

74
In the raw data, there can be various types of underlying patterns which also gives an in-
depth knowledge about subject of interest and provides insights about the problem. But
caution should be observed with respect to data as it may contain null values, or
redundant values, or various types of ambiguity, which also demands for pre-processing
of data. Dataset should therefore be explored as much as possible. Various factors
important by statistical means like mean, standard deviation, median, count of values
and maximum value etc. are shown in Fig.4 for numerical variables of our dataset.

75
4.3 MODEL TRAINING

A machine learning model is a file that has been trained to recognize certain types of
patterns. You train a model over a set of data, providing it an algorithm that it can use to
reason over and learn from those data.

Once you have trained the model, you can use it to reason over data that it hasn't seen
before, and make predictions about those data. For example, let's say you want to build
an application that can recognize a user's emotions based on their facial expressions.
You can train a model by providing it with images of faces that are each tagged with a
certain emotion, and then you can use that model in an application that can recognize
any user's emotion.

Algorithms employed Scikit-Learn can be used to track machine-learning system on


wholesome basis . Algorithms employed for predicting sales for this dataset are
discussed as follows: • Random Forest Algorithm Random forest algorithm is a very
accurate algorithm to be used for predicting sales. It is easy to use and understand for
the purpose of predicting results of machine learning tasks. In sales prediction, random
forest classifier is used because it has decision tree like hyperparameters. The tree model
is same as decision tool. Fig.5 shows the relation between decision trees and random
forest. To solve regression tasks of prediction by virtue of random forest, the
sklearn.ensemble library’s random forest regressor class is used. The key role is played

76
by the parameter termed as n_estimators which also comes under random forest
regressor. Random forest can be referred to as a meta-estimator used to fit upon
numerous decision trees (based on classification) by taking the dataset’s different sub-
samples. min_samples_split is taken as the minimum number when splitting an internal
node if integer number of minimum samples are considered. A split’s quality is measured
using mse (mean squared error), which can also be termed as feature selection criterion.
This also means reduction in variance mae (mean absolute error), which is another
criterion for feature selection. Maximum tree depth, measured in integer terms, if equals
one, then all leaves are pure or pruning for better model fitting is done for all leaves less
than min_samples_split samples.

77
Linear Regression Algorithm Regression can be termed as a parametric technique which
is used to predict a continuous or dependent variable on basis of a provided set of
independent variables. This technique is said to be parametric as different assumptions
are made on basis of data set. Y = βo + β1X + ∈ (1) Equation shown in eq.1 is used for
simple linear regression. These parameters can be said as: Y - Variable to be predicted
X - Variable(s) used for making a prediction βo - When X=0, it is termed as prediction
value or can be referred to as intercept term β1 - when there is a change in X by 1 unit it
denotes change in Y. It can also be said as slope term ∈ -The difference between the
predicted and actual values is represented by this parameter and also represents the
residual value. However efficiently the model is trained, tested and validated, there is
always a difference between actual and predicted values which is irreducible error thus
we cannot rely completely on the predicted results by the learning algorithm. Alternative
methods given by Dietterich can be used for comparing learning algorithms .

78
Random Forest Regression

Random forest is a supervised learning algorithm that uses an ensemble learning method for
classification and regression. Random forest is a bagging technique and not a boosting
technique. The trees in random forests run in parallel, meaning is no interaction between
these trees while building the trees.

79
Extra tress regressor

An extra-trees regressor. This class implements a meta estimator that fits a number of
randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses
averaging to improve the predictive accuracy and control over-fitting.

80
5. CONCLUSION

5.1 CONCLUSION
With traditional methods not being of much help to the business organizations in revenue
growth, use of Machine Learning approaches prove to be an important aspect for shaping
business strategies keeping into consideration the purchase patterns of the consumers.
Prediction of sales with respect to various factors including the sales of previous years
helps businesses adopt suitable strategies for increasing sales and set their foot undaunted
in the competitive world

Machine learning and the associated data processing and modeling algorithms have been
described, followed by their application for the task of sales prediction in Big Mart shopping
centers at different locations. On implementation, the prediction results show the correlation
among different attributes considered and how a particular location of medium size recorded
the highest sales, suggesting that other shopping locations should follow similar patterns for
improved sales. Multiple instances parameters and various factors can be used to make this
sales prediction more innovative and successful. Accuracy, which plays a key role in
prediction-based systems, can be significantly increased as the number of parameters used
are increased. Also, a look into how the sub-models work can lead to increase in productivity
of system. The project can be further collaborated in a web-based application or in any
device supported with an in-built intelligence by virtue of Internet of Things (IoT), to be
more feasible for use. Various stakeholders concerned with sales information can also
provide more inputs to help in hypothesis generation and more instances can be taken into
consideration such that more precise results that are closer to real world situations are
generated. When combined with effective data mining methods and properties, the
traditional means could be seen to make a higher and positive effect on the overall
development of corporation’s tasks on the whole. One of the main highlights is more
expressive regression outputs, which are more understandable bounded with some of
accuracy. Moreover, the flexibility of the proposed approach can be increased with variants
at a very appropriate stage of regression model-building. There is a further need of
experiments for proper measurements of both accuracy and resource efficiency to assess and
optimize correctly.

81
Experts also shown that a smart sales forecasting program is required to manage vast
volumes of data for business organizations. Business assessments are based on the speed
and precision of the methods used to analyze the results. The Machine Learning Methods
presented in this research paper should provide an effective method for data shaping and
decision-making. New approaches that can better identify consumer needs and formulate
marketing plans will be implemented. The outcome of machine learning algorithms will help
to select the most suitable demand prediction algorithm and with the aid of which Big Mart
will prepare its marketing campaigns.

We are predicting the accuracy for XG Boost Regressor. Our predictions help big marts to
refine their methodologies and strategies which in turn helps them to increase their profit.
The results predicted will be very useful for the executives of the company to know about
their sales and profits. This will also give them the idea for their new locations or Centre’s
of Bigmart.

82
5.2 FUTURE SCOPE

Multiple instances parameters and various factors can be used to make this sales prediction
more innovative and successful. Accuracy, which plays a key role in prediction-based
systems, can be significantly increased as the number of parameters used are increased. Also,
a look into how the sub-models work can lead to increase in productivity of system. The
project can be further collaborated in a web-based application or in any device supported
with an in-built intelligence by virtue of Internet of Things (IoT), to be more feasible for
use. Various stakeholders concerned with sales information can also provide more inputs to
help in hypothesis generation and more instances can be taken into consideration such that
more precise results that are closer to real world situations are generated. When combined
with effective data mining methods and properties, the traditional means could be seen to
make a higher and positive effect on the overall development of corporation’s tasks on the
whole. One of the main highlights is more expressive regression outputs, which are more
understandable bounded with some of accuracy.Moreover, the flexibility of the proposed
approach can be increased with variants at a very appropriate stage of regression
modelbuilding. There is a further need of experiments for proper measurements of both
accuracy and resource efficiency to assess and optimize correctly

83
REFERENCES

[1] Singh Manpreet, Bhawick Ghutla, Reuben Lilo Jnr, Aesaan FS Mohammed, and
Mahmood A. Rashid. "Walmart's Sales Data Analysis-A Big Data Analytics Perspective."
In 2017 4th Asia-Pacific World Congress on Computer Science and Engineering (APWC
on CSE), pp. 114-119. IEEE, 2017.

[2] Sekban, Judi. "Applying machine learning algorithms in sales prediction." (2019).

[3] Panjwani, Mansi, Rahul Ramrakhiani, Hitesh Jumnani, Krishna Zanwar, and Rupali
Hande. Sales Prediction System Using Machine Learning. No. 3243. EasyChair, 2020.

[4] Cheriyan, Sunitha, Shaniba Ibrahim, Saju Mohanan, and Susan Treesa. "Intelligent Sales
Prediction Using Machine Learning Techniques." In 2018 International Conference on
Computing, Electronics & Communications Engineering (iCCECE), pp. 53-58. IEEE, 2018.

[5] Giering, Michael. "Retail sales prediction and item recommendations using customer
demographics at store level." ACM SIGKDD Explorations Newsletter 10, no. 2 (2008): 84-
89.

[6] Baba, Norio, and Hidetsugu Suto. "Utilization of artificial neural networks and GAs for
constructing an intelligent sales prediction system." In Proceedings of the IEEE-INNS-
ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural
Computing: New Challenges and Perspectives for the New Millennium, vol. 6, pp. 565-570.
IEEE, 2000.

[7] Ragg, Thomas, Wolfram Menzel, Walter Baum, and Michael Wigbers. "Bayesian
learning for sales rate prediction for thousands of retailers." Neurocomputing 43, no. 1-4
(2002): 127-144. [8] Fawcett, Tom, and Foster J. Provost. "Combining Data Mining and
Machine Learning for Effective User Profiling."

84
ONLINE REFERENCES

• https://www.codecademy.com/
• https://en.wikipedia.org/
• https://www.geeksforgeeks.org/
• https://stackoverflow.com/
• https://www.javatpoint.com/
• https://www.tutorialspoint.com/
• http://www.Quora.com/
• https://www.w3schools.com/

85

You might also like