Machine Learning

Machine Learning
Introduction to Machine Learning

What is machine learning?
• Machine learning is the idea that there are generic algorithms that can tell you
something interesting about a set of data without you having to write any custom
code specific to the problem.
• Instead of writing code, you feed data to the generic algorithm and it builds its
own logic based on the data.
• Machine Learning (ML) encompasses a lot of things. The field is vast and is
expanding rapidly. It is a branch of Artificial Intelligence(AI). Loosely speaking, ML
is the field of study that gives computer algorithms the ability to learn without
being explicitly programmed. The outcome we want from our computer
algorithm is PREDICTION. This is different from our previous problems where we
wanted the algorithm to solve a specific problem such as finding the best web
page for our search, sorting a list of items, or generating a secure means to
computing a shared secret in cryptography. What are we going to use to predict?
An example application
• An emergency room in a hospital measures 17 variables (e.g., blood
pressure, age, etc.) of newly admitted patients.
• A decision is needed: whether to put a new patient in an intensive-care
unit.
• Due to the high cost of ICU, those patients who may survive less than a
month are given higher priority.
• Problem: to predict high-risk patients and discriminate them from low-
risk patients.
11
Another application
• A credit card company receives thousands of applications for new cards.
Each application contains information about an applicant,
• age
• Marital status
• annual salary
• outstanding debts
• credit rating
• etc.
• Problem: to decide whether an application should be approved, or to
classify applications into two categories, approved and not approved.
CS583, Bing Liu, UIC 12

Machine learning
• Like human learning from past experiences.
• A computer does not have “experiences”.
• A computer system learns from data, which represent some “past
experiences” of an application domain.
• Our focus: learn a target function that can be used to predict the values of a
discrete class attribute, e.g., approve or not-approved, and high-risk or low
risk.
• The task is commonly called: Supervised learning, classification, or inductive
learning.
CS583, Bing Liu, UIC 13

The data and the goal
• Data: A set of data records (also called examples, instances or cases)
described by
• k attributes: A1, A2, … Ak.
• a class: Each example is labelled with a pre-defined class.
• Goal: To learn a classification model from the data that can be used to
predict the classes of new (future, or test) cases/instances.
14
Machine Learning Types
• Supervised Learning
• Uses labeled data
• Results compared with the correct answer.
• Requires large amounts of data to refine the model and produce more
accurate results.
• Common Techniques: Classification , Regression
• Use Cases: Fraud Detection, Image Recognition
• Unsupervised Learning
• Working with unlabeled data.
• A learning algorithms is used to detect patterns
• Most common unsupervised learning technique is clustering which takes
unlabeled data and uses algorithms to put similar items into groups.
• Use cases: Customer segmentation, sentiment analysis
Reinforcement Learning
• Through this trial-and-error process
• learning was improved based on positive and negative reinforcement.
• Use Cases : Games, Robotics
Algorithm Use Case Example Outcome
Liner Regression
Supervised Estimating product price elasticity
Logistics Regression Classify customers on likeliness to repay a loan
Learning
Linear / Quadratic Discriminant Analysis Classify customer on likeliness to repay a loan
Used when we know Decision Tree Find attributes in a product that make it likely for purchase Descriptive
the classification of
Naïve Bayes Analyze sentiments to assess product perception
data and what to What Happened?
predict Support Vector Machine Analyze sentiments to assess product perception
Random Forest Predict power usage in a distribution grid
AdaBoost Detect fraudulent activity in a credit card
Machine Learning
Unsupervised
K Means Clustering Segment customers into groups by characteristics
Learning
Gaussian Mixture Model Segment customers based on less distinctive characteristics Predictive
Used when we don’t
know the classification Hierarchical clustering Inform product usage by grouping customers What Will Happen?
of data and want the
Recommender System Recommend news article to a readers based on what they are
algorithm to classify
data currently reading
Reinforcement
Learning Balance the load on electricity grids in varying demand cycles
Used when we don’t Optimize the driving behavior of self-driving cars Prescriptive
have training data and Finding real time pricing during a product auction What To Do?
only way to learn
about the
environment is to
learn with it
Machine Learning today is extensively used and has well defined
Algorithms, Tools and Technology while other AI technologies are confined
to vendor provided solutions…
Natural Language Processing Computer Vision Robotic Process Automation
Vendor Product Vendor Product Vendor Product

Google Cloud Natural
Google Vision Sensors, 3D Laser Studio, Front Office
Language Cognex UI Path
Profilers, VisionPro Robot, Orchestrator
Apple Natural Language Vision System, Smart
Framework Omron Blue Prism Enterprise Platform
Camera, Lighting System
HP HPE IDOL Keyence Vision Sensors Thoughtonomy Virtual Workforce
IBM Watson
Basler Cameras, Vison Kit Automation Anywhere IQ Bot, Bot Inside
Microsoft Corp. Linguistic Analysis API Platform Modules, NICE Robotics
& Text Analytics API National Instruments NICE
Computer Based Devices Automation
3M 360 Encompass System Sony CameraLink Kofax Kofax Kapow
Machine Learning
• Regression
• Python • Decision Tree • Scikit learn
• Hadoop • Naïve Bayes • Shogun
• Java • Support Vector • Apache Mahout
Technology • Machine • H2O
•
R
MATLAB
Algorithm • Random Forest Tools • Cloudera Oryx
• ELM • AdaBoost • GoLearn
• Scala • Gradient-boosting • Weka
trees
Landscape of ML Solutions DYI
Skymind's DL4J
Salesforce Einstein
Caffe
SAP Clea Google's TensorFlow Theano
Microsoft Cognitive Toolkit
H2O.ai's Deep Water
Business Application
Baidu's Pebble Intel BigDL
Users Engineers
Amazon Web Services' (AWS)
Embedded Machine-Learning Apache MXNet
Machine Learning APIs
Source: "Magic Quadrant for Data Science and Machine-Learning Platforms," 22 February 2018. (G00326456)
ML
Engineers
Data R, Python,
Scientists Data Science and Scala, Matlab
Data
Analysts Augmented Analytics Machine-Learning
Platforms Deep-Learning
Frameworks
Data Analysis
Software Intel Nervana Deep-Learning
Microsoft Azure Cloud Platforms
Buy Rescale AWS Deep-Learning Hardware
Google Cloud Platform Nvidia, AMD, IBM, Intel
14 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.
Application Example:
Natural Language Processing
• Describe two end-to-end examples for applications involving natural

language text
o Support ticket classification
o Recruiting – CV matching
Support ticket classification
Example: Classify support tickets into categories so that they can be routed
to corresponding agents
1. Do you need machine learning?
• High volume of support tickets
• Human language is complex and ambiguous
2. Can you formulate your problem clearly?
• Given a customer support ticket, predict its service category
• Input: customer support ticket; output: service category
3. Do you have sufficient examples?
• Large volume of customer support tickets with respective service category from
ticket support systems
Support ticket classification (Cont’d)
4. Does your problem have a regular pattern?
• Common customer issues will have many tickets
• Issues will correlate with common keywords, e.g., bill or payment will appear
more often in support tickets with category payments
5. Can you find meaningful representations of your data?
• Represent customer support tickets as vector of word frequencies
• Label is the service category of the customer support ticket
6. How do you define success?
• Measure percentage of correctly predicted service categories
Recruiting – CV matching
Example: Shortlist candidates during recruiting
• Hundreds of applications per job opening
• Manual effort to read CVs and screen candidates
• Given a candidate’s CV and a job description, predict suitability
• Input: CV and job description; output: yes/no
• Large volume of previous job applications, job descriptions, and whether
candidate was invited for interview
Recruiting – CV matching (Cont’d)
• Required skills in job description should match experience in CV
• Good CVs have no typos, are neither too long nor too short, etc.
• Represent CVs and job descriptions as vector of features that measure
similarity and match
• Label is whether the candidate was invited for interview
• Measure precision and recall of correct predictions
Application Example:
Computer Vision
• Describe end-to-end examples for applications involving computer
vision
• Retail shelf analytics
• Fashion apparel color analysis
Understanding Machine Learning
What do we mean by learning?
• Given
• a data set D,
• a task T, and
• a performance measure M,
a computer system is said to learn from D to perform the task T if after
learning the system’s performance on T improves as measured by M.
• In other words, the learned model helps the system to perform T
better as compared to no learning.
38
An example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.
No learning: classify all future applications (test data) to the majority

class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
Machine learning capabilities
• Machine learning is used in many applications
o Computer vision: face recognition, object recognition
o Natural language processing: machine translation, sentiment analysis
o Recommender systems
• Recent breakthroughs using deep learning
o Automatically generate image captions
o AlphaGo: AI beats the world’s top Go player
Typical machine learning tasks
What Is Machine Learning?
Regression
Feature Evaluation with

Extraction Training Validation Data
Training Data Feature Vectors
Models Models
From Business Problem to Machine Learning
Problem: A Recipe
Step-by-step “recipe” for qualifying a business problem as a machine
learning problem
When to use machine learning
Problem formulation
• What do you want to predict given which input?
• Pattern: “given X, predict Y”
o What is the input?
o What is the output?
Example: sentiment analysis
• Given a customer review, predict its sentiment
• Input: customer review text
• Output: positive, negative, neutral
Collecting data
 Machine learning always requires data!
 Generally, the more data, the better
 Each example must contain two parts (supervised learning)
o Features: attributes of the example
o Label: the answer you want to predict
• Thousands of customer reviews and ratings from the Web
Regularities in the data
• Machine learning learns regularities and patterns
• Hard to learn patterns that are rare or irregular

• Positive words like good, awesome, or love it appear more often in
highly-rated reviews
• Negative words like bad, lousy, or disappointed appear more often in
poorly-rated reviews
Representations and features
• Machine learning algorithms ultimately operate on numbers
• Generally, examples are represented as feature vectors
• Good features often determine the success of machine
learning

• Represent customer review as vector of word frequencies
• Label is positive (4-5 stars), negative (1-2 stars), neutral (3
stars)
Evaluating success
• Machine learning optimizes a training criteria
• The evaluation function has to support the business goals

• Accuracy: percentage of correctly predicted labels
The “cheat sheet”
Machine Learning in Enterprise
Creating machine Computing
learning models
Creating machine learning models
2. Train a model
on training set
Data Feature Model Parameter

Cleaning Processing Selection Optimization
Training Set
1. Split data into
Model
training & testing subsets
Data with
3. Make predictions
Inputs & labels
on the testing set
Testing Set
4. Compare predicted and true labels

The Challenge of Machine Learning: Under and Overfitting
Underfitting Overfitting
Avg. Avg. Avg.

spend/visit spend/visit spend/visit
Store distance Store distance Store distance
Easy to be good
Error
Predictor is too "simplistic" on the training data
Test set
Cannot capture the pattern Predictor is too "powerful"
Rote learning
Training set
Low High
Model complexity
22 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.

An example: data (loan application)
Approved or not
55
An example: the learning task
• Learn a classification model from the data
• Use the model to classify future loan applications into
• Yes (approved) and
• No (not approved)
• What is the class for following case/instance?
56
Machine Learning : Supervised
and Unsupervised learning
Supervised learning vs. unsupervised learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute
in future data instances.
• Unsupervised learning: The data have no target attribute.

• We want to explore the data to find some intrinsic structures in them.
61
Differences between Supervised vs.
unsupervised Learning
• Supervised learning: classification is seen as supervised learning from
examples.
• Supervision: The data (observations, measurements, etc.) are labeled with
pre-defined classes. It is like that a “teacher” gives the classes (supervision).
• Test data are classified into these classes too.
• Unsupervised learning (clustering)
• Class labels of the data are unknown
• Given a set of data, the task is to establish the existence of classes or clusters
in the data
Difference between Classification and Clustering
Classification Clustering
• Classification is used in supervised learning technique where • Clustering is used in unsupervised learning where similar
predefined labels are assigned to instances by properties instances are grouped, based on their features or properties.
• Classification is the process of learning a model that • Clustering is a technique of organising a group of data into
elucidate different predetermined classes of data. It is a two- classes and clusters where the objects reside inside a cluster
step process, comprised of a learning step and will have high similarity and the objects of two clusters would
a classification step. In learning step, a classification model is be dissimilar to each other.
constructed and classification step the constructed model is
used to prefigure the class labels for given data. • The main target of clustering is to divide the whole data into
multiple clusters. Unlike classification process, here the class
• Classification Techniques: Decision Trees, KNN, Regression, labels of objects are not known before, and clustering
Naïve Bayes pertains to unsupervised learning.
• Example: In a banking application, the customer who applies • In clustering, the similarity between two objects is measured
for a loan may be classified as a safe and risky according to by the similarity function where the distance between those
his/her age and salary. The produced model could be in the two object is measured. Shorter the distance higher the
form of a decision tree or in a set of rules. similarity, conversely longer the distance higher the
dissimilarity.
• Classification techniques : decision tree, neural networks,
logistic regression, etc. • Clustering Techniques: K Mean
• Example: Customer Segmentation
Supervised learning process: two steps
 Learning (training): Learn a model using the training data
 Testing: Test the model using unseen test data to assess the
model accuracy
Number of correct classifications

Accuracy ,
Total number of test cases
65
Machine Learning in Enterprise Computing
Machine learning Machine learning
Train machine-learning model on historical data

Historical
Deploy the model to make predictions on new data Data
Regularly retrain the model with new data Training Process

Focus on making predictions about future data
New
Model Result
Data
Update by Retraining
Traditional rule-based approach vs. machine
learning
Fundamental assumption of learning
Assumption: The distribution of training examples is identical to the
distribution of test examples (including future unseen examples).
• In practice, this assumption is often violated to certain degree.

• Strong violations will clearly result in poor classification accuracy.
• To achieve good accuracy on the test data, training examples must
be sufficiently representative of the test data.
SUPERVISED LEARNING TECHNIQUES
• Regression:
• Linear Regression
• Ensemble Modelling
• Decision Trees
• Classification
• Naïve Bayes Classifier
• K Nearest
• Neural Networks
Regression (SUPERVISED LEARNING)
• Regression shows the relationship between certain variables.

• Regression models a target prediction value based on independent
variables.
• It is mostly used for finding out the relationship between variables and
forecasting.
• Different regression models differ based on – the kind of relationship
between dependent and independent variables, they are considering and
the number of independent variables being used
• Applications: Financial forecasting, trend analysis, marketing, time series
prediction and even drug response modeling fraud detection, credit card
scoring and clinical trials
Decision Trees (SUPERVISED LEARNING)
• A decision tree can be used to visually and explicitly represent decisions and decision
making.
• This approach generally works better with nonnumerical data.
• Decision Trees are excellent tools for helping choose between several courses of action.
• Decision making under uncertainty
• They provide a highly effective structure for laying out options and investigating the
possible outcomes of choosing those options.
• Also help form a balanced picture of the risks and rewards associated with each possible
course of action.
• Growing a tree involves deciding on which features to choose and what conditions to
use for splitting, along with knowing when to stop.
• Used for creating Rules
• Applications: Customer Churn Analysis, Energy Consumption Patterns, Market Basket
Analysis, Fraudulent Practice, Sentiment Analysis, Investment Solutions
Example: Decision Tree
Age Root Node
Model process:
 A record in the query starts at the root node
>= 35 <35 Test
 A test (in the model) determines which node the
record should go to next
 All records end up in a leaf node
Buy
Income
100%
Interpreting the Results
<=$5000 >$5000 Decision
Read the tree from top to bottom
Node
Rule:
Won’t Buy Credit If Age is less than 35 and
100% Rating Income is greater than $5000 and
Credit standing is Fair, then the customer has
a 35% chance of buying the product
Excellent Fair
Leaf Nodes
Age, then Income and credit rating, are the
most influential attributes determining
buying behavior.
Won’t Buy Buy
65% 35%
Ensemble Modelling ( SUPERVISED
LEARNING)
• Ensemble modeling is the process of running two or more related but
different analytical models and then synthesizing the results into a
single score or spread in order to improve the accuracy of predictive
analytics and data mining applications.
• A single model can have biases, high variability or inaccuracies
• Combining different models or analyzing multiple samples can reduce
the effects of those limitations and provide better information to
business decision makers.
• Even though this increases the complexity, this approach has been
shown to generate strong results.
Naïve Bayes Classification (SUPERVISED
LEARNING)
• Naive Bayes classifiers are a collection of classification algorithms based
on Bayes’ Theorem.
• It is“naïve” because the assumption is that each Feature is Independent
and makes and Equal contribution to the outcome.
• This may seem like a drawback but Naïve Bayes Classifier has proven to be
quite effective and fast to develop.
• The reason is that this approach is useful in classifying data based on key
features and patterns.
• They require a small amount of training data to estimate the necessary
parameters.
• Applications: Text analysis. Examples email spam detection, customer
segmentation, sentiment analysis, medical diagnosis
K Nearest Neighbour(SUPERVISED LEARNING)
• k-NN is a method for classifying a dataset (k represents the number of neighbors).
• The theory is that those values that are close together are likely to be good predictors for
a model
• Calculates the distance between the nearest values
• The k-NN algorithm finds the k number of samples in the training which are nearer to the
test samples.
• In this method, three components play a key role: data samples, distance metric and
number of the neighbors i.e. k-value.
• For any classification task, initially, it computes the distance between the unlabeled data
samples and other labeled samples. Based on the computed distance, labeled data
sample is assigned to the nearest labeled sample.
• Numerical values: Euclidian distance
• Categorical data - Overlap metric (this is where the data is the same or very similar).
• Applications: credit score, image recognition
Artificial Neural Networks (Supervised
Learning)
• Neural net algorithms are based on how our brain processes information.
• In 1943 a neuroscientist and a logician developed the first conceptual model of an artificial neural
network.
• Neural net algorithms do NOT model how our brain works but they are inspired by how our brain
works and designed to solve certain kinds of problems.
• The human brain contains approximately 100 billion nerve cells called neurons.
• Each neuron is connected to thousands of other neurons and communicates with them through
electrochemical signals.
• Signals coming into a neuron are received via junctions called synapses which are located at the
end of branches of the neuron called dendrites. The neuron continuously receives signals from
these inputs and sums up its inputs in some way and then, if the end result is greater than some
threshold value, the neuron “fires”. It generates a voltage and outputs a signal along something
called an axon.
• Since the output of the neuron is “fire” or “don’t fire” it is a binary output which can be imitated
on a computer easily
What is a Neural Network?
• We made a simple estimation function that takes in a set of inputs
and multiplies them by weights to get an output. Call this simple
function a neuron.
• By chaining lots of simple neurons together, we can model functions
that are too complicated to be modeled by one single neuron.
Artificial Neural Networks (Supervised
Learning)
• A neural network is a connectionist computational system.
• A true neural network does not follow a linear path but rather
information is processed collectively in parallel throughout a network
of nodes (neurons).
• Neural network algorithms are made up of many artificial neurons;
the number needed depends on how difficult the task is
• Each neuron can have multiple inputs but only a single output which
is binary.
Working of a Neural Network
• Initially the weights are guessed and then the algorithm adjusts them
during the training
• As each input enters the neuron its value is multiplied by its weight.
• These values are summed for all inputs.
• If the summed valued is >= threshold (such as 0 or 1) then it “fires";
i.e., it gives a positive output.
• If the summed valued is < threshold then it does NOT fire "; i.e., it
gives a negative output.
• If the output of the neuron matches the correct output in the
training set, then weights are not modified
Backpropagation
• Difficult of make adjustments to the weights in the model.
• Back propagation is an alternative… it’s about adjusting the neural
network when errors are found and then iterating the new values
through the neural network again
• Essentially, the process involves slight changes that continue to
optimize the model.
In other words, it’s easy to guess the next letter if we take into account
the sequence of letters that came right before it and combine that with
our knowledge of the rules of English.
To solve this problem with a neural network, we need to add state to

our model. Each time we ask our neural network for an answer, we also
save a set of our intermediate calculations and re-use them the next
time as part of our input. That way, our model will adjust its predictions
based on the input that it has seen recently.
What’s a single letter good for?
• Keeping track of state in our model makes it possible to not just
predict the most likely first letter in the story, but to predict the most
likely next letter given all previous letters. This is the basic idea of a
Recurrent Neural Network.
• One cool use might be auto-predict for a mobile phone keyboard.
• But what if we took this idea to the extreme? What if we asked the
model to predict the next most likely character over and over—
forever? We’d be asking it to write a complete story for us!
• We know that the idea of machine learning is that the
same generic algorithms can be reused with different
data to solve different problems. So let’s modify this
same neural network to recognize handwritten text.
But to make the job really simple, we’ll only try to
recognize one letter—the numeral “8”.
• Machine learning only works when you have data—
preferably a lot of data. So we need lots and lots of
handwritten “8”s to get started. Luckily, researchers
created the MNIST data set of handwritten numbers
for this very purpose. MNIST provides 60,000 images
of handwritten digits, each as an 18x18 image.
Machine learning only works when you have data— preferably a lot of
data. So we need lots and lots of handwritten “8”s to get started.
Some 8s from the MNIST data set

Luckily, researchers created the MNIST data set of handwritten
numbers for this very purpose. MNIST provides 60,000 images of
handwritten digits, each as an 18x18 image. Here are some “8”s from
the data set:
Some 8s from the MNIST data set
If you think about it, everything is just numbers
The neural network we made in Part 2 only took in a three numbers as
the input (“3” bedrooms, “2000” sq. feet , etc.). But now we want to
process images with our neural network. How in the world do we feed
• To a computer, an image is really just a grid of
numbers that represent how dark each pixel is:
To feed an image into our neural network, we simply treat the
18x18 pixel image as an array of 324 numbers:
To handle 324 inputs, we’ll just enlarge our neural
network to have 324 input nodes:
Training Data
Mmm… sweet, sweet training data
Clustering (Unsupervised Learning)
• Clustering is a technique for finding similarity groups in data, called clusters.
I.e.,
• it groups data instances that are similar to (near) each other in one cluster and data
instances that are very different (far away) from each other into different clusters.
• Clustering is often called an unsupervised learning task as no class values
denoting an a priori grouping of the data instances are given, which is the
case in supervised learning.
• Due to historical reasons, clustering is often considered synonymous with
unsupervised learning.
• In fact, association rule mining is also unsupervised
An illustration
• The data set has three natural groups of data points, i.e., 3
natural clusters.
100
What is clustering for?
Let us see some real-life examples
• Example 1: groups people of similar sizes together to make “small”,
“medium” and “large” T-Shirts.
• Tailor-made for each person: too expensive
• One-size-fits-all: does not fit all.
• Example 2: In marketing, segment customers according to their
similarities
• To do targeted marketing.
What is clustering for?
• Example 3: Given a collection of text documents, we want to organize
them according to their content similarities,
• To produce a topic hierarchy
• In fact, clustering is one of the most utilized data mining techniques.
• It has a long history, and used in almost every field, e.g., medicine, psychology,
botany, sociology, biology, archeology, marketing, insurance, libraries, etc.
• In recent years, due to the rapid increase of online documents, text clustering
becomes important.
K-Means Clustering (Unsupervised Learning
Clustering)
• The k-Means clustering algorithm, which is effective for large datasets, puts
similar, unlabeled data into different groups.
• The first step is to select k, which is the number of clusters; generally by
visualizations of that data to see if there are noticeable grouping areas.
• Works with numeric data only!
Algorithm:
• Pick a number k of random cluster centers
• Assign every item to its nearest cluster center using a distance metric
• Move each cluster center to the mean of its assigned items
• Repeat 2-3 until convergence (change in cluster assignment less than a threshold)
Association
• Association rules help establish associations amongst data objects
inside large databases.
• This unsupervised technique is about discovering interesting
relationships between variables in large databases. For example,
people that buy a new home most likely to buy new furniture.
• Other Examples:
• A subgroup of cancer patients grouped by their gene expression
measurements
• Groups of shopper based on their browsing and purchasing histories
• Movie group by the rating given by movies viewers

Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning

Uploaded by

Copyright:

Available Formats

Machine Learning

Introduction to Machine Learning

CS583, Bing Liu, UIC 12

CS583, Bing Liu, UIC 13

Natural Language Processing Computer Vision Robotic Process Automation

Vendor Product Vendor Product Vendor Product

• Describe two end-to-end examples for applications involving natural

No learning: classify all future applications (test data) to the majority

Feature Evaluation with

Training Data Feature Vectors

Example: sentiment analysis

Example: sentiment analysis

Example: sentiment analysis

Data Feature Model Parameter

4. Compare predicted and true labels

Avg. Avg. Avg.

Store distance Store distance Store distance

22 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.

• Unsupervised learning: The data have no target attribute.

Number of correct classifications

Train machine-learning model on historical data

Regularly retrain the model with new data Training Process

• In practice, this assumption is often violated to certain degree.

• Regression shows the relationship between certain variables.

To solve this problem with a neural network, we need to add state to

Some 8s from the MNIST data set

• Works with numeric data only!

You might also like