You are on page 1of 118

Machine Learning

Introduction to Machine
Learning
What is machine learning?
• Machine learning is the idea that there are generic algorithms that can tell you something
interesting about a set of data without you having to write any custom code specific to the
problem.

• Instead of writing code, you feed data to the generic algorithm and it builds its own logic
based on the data.
• Machine Learning (ML) encompasses a lot of things. The field is vast and is expanding
rapidly. It is a branch of Artificial Intelligence(AI). Loosely speaking, ML is the field of study
that gives computer algorithms the ability to learn without being explicitly programmed.
The outcome we want from our computer algorithm is PREDICTION. This is different from
our previous problems where we wanted the algorithm to solve a specific problem such as
finding the best web page for our search, sorting a list of items, or generating a secure
means to computing a shared secret in cryptography. What are we going to use to predict?
An example application
• An emergency room in a hospital measures 17 variables (e.g., blood
pressure, age, etc.) of newly admitted patients.
• A decision is needed: whether to put a new patient in an intensive-care
unit.
• Due to the high cost of ICU, those patients who may survive less than a
month are given higher priority.
• Problem: to predict high-risk patients and discriminate them from low-
risk patients.

11
Another application
• A credit card company receives thousands of applications for new cards.
Each application contains information about an applicant,
• age
• Marital status
• annual salary
• outstanding debts
• credit rating
• etc.
• Problem: to decide whether an application should be approved, or to
classify applications into two categories, approved and not approved.

CS583, Bing Liu, UIC 12


Machine learning
• Like human learning from past experiences.
• A computer does not have “experiences”.
• A computer system learns from data, which represent some “past
experiences” of an application domain.
• Our focus: learn a target function that can be used to predict the values of a
discrete class attribute, e.g., approve or not-approved, and high-risk or low
risk.
• The task is commonly called: Supervised learning, classification, or inductive
learning.

CS583, Bing Liu, UIC 13


The data and the goal
• Data: A set of data records (also called examples, instances or cases)
described by
• k attributes: A1, A2, … Ak.
• a class: Each example is labelled with a pre-defined class.
• Goal: To learn a classification model from the data that can be used to
predict the classes of new (future, or test) cases/instances.

14
Machine Learning Types
• Supervised Learning
• Uses labeled data
• Results compared with the correct answer.
• Requires large amounts of data to refine the model and produce more
accurate results.
• Common Techniques: Classification , Regression
• Use Cases: Fraud Detection, Image Recognition
Unsupervised Learning
• Working with unlabeled data.
• A learning algorithm is used to detect patterns
• Most common unsupervised learning technique is clustering which
takes unlabeled data and uses algorithms to put similar items into
groups.
• Use cases: Customer segmentation, sentiment analysis
Reinforcement Learning
• Through this trial-and-error process
• learning was improved based on positive and negative reinforcement.
• Use Cases : Games, Robotics
Algorithm Us e Cas e Example Outcome

Liner Regres s ion Estimating product price elasticity


Supervis ed
Lo gis tics Reg res s ion Classify customers on likeliness to repay a loan
Learning
Linear / Quadratic Dis c riminant Analys is Classify customer on likeliness to repay a loan
Us ed when we kno w Decis io n Tree Find attributes in a product that make it likely for purchase De s criptive
the clas s ificatio n of Naïve Bayes Analyze sentiments to assess product perception
data and what to What Happened?
predict Support Vec tor Mac hine Analyze sentiments to assess product perception
Random Fores t Predict power usage in a distribution grid
AdaBoo s t Detect fraudulent activity in a credit card
Mac hine Le arning

Uns upervis ed
K Means Clus tering Segment customers into groups by characteristics
Learning
Gaus s ian Mixture Model Segment customers based on less distinctive characteristics Predic tive
Us e d when we do n’t
know the clas s ification Hierarchical clus tering Inform product usage by grouping customers What Will Happen?
o f data and want the
Rec ommender Sys tem Recommend news article to a readers based on what they are
algo rithm to clas s ify
data
currently reading

Reinforc eme nt
Learning Balance the load on electricity grids in varying demand cycles

Us ed when we do n’t Optimize the driving behavior of self-driving cars Pre s c riptive
have training data and Finding real time pricing during a product auction What To Do?
o nly way to learn
about the
environme nt is to
learn with it
Mac hine Le arning to day is e xte ns ive ly us e d and has we ll de fine d 11
Alg o rithms , To o ls and Te c hno lo g y while o the r AI te c hno lo g ie s are c o nfine d
to ve ndo r pro vide d s o lutio ns …

Natural Lang uag e Pro c e s s ing Co mpute r Vis io n Ro bo tic Pro c e s s Auto matio n

Ve ndo r Pro duc t Ve ndo r Pro duc t Ve ndo r Pro duc t


Google Cloud Natural
Google Vision Sensors, 3D Laser S tudio , Fro nt Offic e
Language Cognex
Profilers, VisionPro
UI Path
Ro bo t, Orc he s trato r
Apple Natural Language Vision System, Smart
Framework Omron Blue Pris m Ente rpris e Platfo rm
Camera, Lighting System
HP HPE IDOL Keyence Vision Sensors Tho ug hto no my Virtual Wo rkfo rc e
IBM Watson
Basler Cameras, Vison Kit Auto matio n Anywhere IQ Bo t, Bo t Ins ide
Microsoft Corp. Linguistic Analysis API Platform Modules, NICE Ro bo tic s
& Text Analytics API National Instruments NICE
Computer Based Devices Auto matio n
3M 360 Encompass System Sony CameraLink Ko fax Ko fax Kapo w

Mac hine Le arning


• Regression
• Python • Decision Tree • Scikit learn
• Hadoop • Naïve Bayes • Shogun
• Java • Support Vector • Apache Mahout
Machine • H2O
Te c hno lo g y • R
Alg o rithm • Random Forest To o ls
• MATLAB • Cloudera Oryx
• ELM • AdaBoost • GoLearn
• Scala • Gradient-boosting • Weka
trees
Lands c ape o f ML S o lutio ns DYI
Skymind's DL4J
Salesforce Einstein
Caffe
SAP Clea Google's TensorFlow Theano
Microsoft Cognitive Toolkit
H2O.ai's Deep Water
Business Application
Baidu's Pebble Intel BigDL
Users Engineers
Amazon Web Services' (AWS)
Embedded Machine-Learning Apache MXNet
Machine Learning APIs
Source: "Magic Quadrant for Data Science and Machine-Learning Platforms," 22 February 2018. (G00326456)
ML
Engineers
Data R, Python,
Scientists Data Science and Scala, Matlab
Data
Analysts Augmented Analytics Machine-Learning
Platforms Deep-Learning
Frameworks
Data Analysis
Software Intel Nervana Deep-Learning
Microsoft Azure Cloud Platforms
Buy Rescale AWS Deep-Learning Hardware
Google Cloud Platform Nvidia, AMD, IBM, Intel
14 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.
Application Example:
Natural Language Processing

• Describe two end-to-end examples for applications involving natural


language text
o Support ticket classification
o Recruiting – CV matching
Support ticket classification
Example: Classify support tickets into categories so that they can be routed to
corresponding agents
1. Do you need machine learning?
• High volume of support tickets
• Human language is complex and ambiguous
2. Can you formulate your problem clearly?
• Given a customer support ticket, predict its service category
• Input: customer support ticket; output: service category
3. Do you have sufficient examples?
• Large volume of customer support tickets with respective service category from
ticket support systems
Support ticket classification
(Cont’d)
4. Does your problem have a regular pattern?
• Common customer issues will have many tickets
• Issues will correlate with common keywords, e.g., bill or payment will appear
more often in support tickets with category payments
5. Can you find meaningful representations of your data?
• Represent customer support tickets as vector of word frequencies
• Label is the service category of the customer support ticket
6. How do you define success?
• Measure percentage of correctly predicted service categories
Recruiting – CV matching
Example: Shortlist candidates during recruiting
1. Do you need machine learning?
• Hundreds of applications per job opening
• Manual effort to read CVs and screen candidates
2. Can you formulate your problem clearly?
• Given a candidate’s CV and a job description, predict suitability
• Input: CV and job description; output: yes/no
3. Do you have sufficient examples?
• Large volume of previous job applications, job descriptions, and whether
candidate was invited for interview
Recruiting – CV matching (Cont’d)
4. Does your problem have a regular pattern?
• Required skills in job description should match experience in CV
• Good CVs have no typos, are neither too long nor too short, etc.
5. Can you find meaningful representations of your data?
• Represent CVs and job descriptions as vector of features that measure
similarity and match
• Label is whether the candidate was invited for interview
6. How do you define success?
• Measure precision and recall of correct predictions
Application Example:
Computer Vision
• Describe end-to-end examples for applications involving computer
vision
• Retail shelf analytics
• Fashion apparel color analysis
Understanding Machine Learning
What do we mean by learning?
• Given
• a data set D,
• a task T, and
• a performance measure M,
a computer system is said to learn from D to perform the task T if after
learning the system’s performance on T improves as measured by M.
• In other words, the learned model helps the system to perform T better
as compared to no learning.

37
An example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.

No learning: classify all future applications (test data) to the majority


class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
Machine learning capabilities
• Machine learning is used in many applications
o Computer vision: face recognition, object recognition
o Natural language processing: machine translation, sentiment analysis
o Recommender systems
• Recent breakthroughs using deep learning
o Automatically generate image captions
o AlphaGo: AI beats the world’s top Go player
Typical machine learning tasks
What Is Mac hine Le arning ?
Typical machine learning tasks
Typical machine learning tasks

Re g re s s io n

Feature Evaluation with


Extraction Training Validation Data

Training Data Feature Vectors

Models Models
From Business Problem to Machine Learning
Problem: A Recipe
Step-by-step “recipe” for qualifying a business problem as a machine
learning problem
1. Do you need machine learning?
2. Can you formulate your problem clearly?
3. Do you have sufficient examples?
4. Does your problem have a regular pattern?
5. Can you find meaningful representations of your data?
6. How do you define success?
When to use machine learning
Problem formulation
2. Can you formulate your problem clearly?
• What do you want to predict given which input?
• Pattern: “given X, predict Y”
o What is the input?
o What is the output?
Example: sentiment analysis
• Given a customer review, predict its sentiment
• Input: customer review text
• Output: positive, negative, neutral
Collecting data
3. Do you have sufficient examples?
 Machine learning always requires data!
 Generally, the more data, the better
 Each example must contain two parts (supervised learning)
o Features: attributes of the example
o Label: the answer you want to predict
Example: sentiment analysis
• Thousands of customer reviews and ratings from the Web
Regularities in the data
4. Does your problem have a regular pattern?
• Machine learning learns regularities and patterns
• Hard to learn patterns that are rare or irregular

Example: sentiment analysis


• Positive words like good, awesome, or love it appear more often in
highly-rated reviews
• Negative words like bad, lousy, or disappointed appear more often in
poorly-rated reviews
Representations and features
5. Can you find meaningful representations of your data?
• Machine learning algorithms ultimately operate on numbers
• Generally, examples are represented as feature vectors
• Good features often determine the success of machine learning

Example: sentiment analysis


• Represent customer review as vector of word frequencies
• Label is positive (4-5 stars), negative (1-2 stars), neutral (3 stars)
Evaluating success
6. How do you define success?
• Machine learning optimizes a training criteria
• The evaluation function has to support the business goals

Example: sentiment analysis


• Accuracy: percentage of correctly predicted labels
The “cheat sheet”
Mac hine LeCreating machine
arning in Ente learning
rpris e Co mputing models
Creating machine learning models

2. Train a model
on training set

Data Feature Model Parameter


Cleaning Processing Selection Optimization

Training Set
1. Split data into
Model
training & testing subsets

Data with
3. Make predictions
Inputs & labels
on the testing set

Testing Set

4. Compare predicted and true labels


The Challe ng e o f Mac hine Le arning : Unde r and Ove rfitting

Underfitting Overfitting

Avg. Avg. Avg.


spend/visit spend/visit spend/visit

Store distance Store distance Store distance

Easy to be good
Predictor is too "simplistic"

Error
on the training data
Test set
Cannot capture the pattern Predictor is too "powerful"
Rote learning
Training set

Low High
Model complexity

22 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


An example: data (loan
application) Approved or not

54
An example: the learning task
• Learn a classification model from the data
• Use the model to classify future loan applications into
• Yes (approved) and
• No (not approved)
• What is the class for following case/instance?

55
Machine Learning : Supervised
and Unsupervised learning
Supervised learning vs. unsupervised learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute
in future data instances.

• Unsupervised learning: The data have no target attribute.


• We want to explore the data to find some intrinsic structures in them.

60
Differences between Supervised vs.
unsupervised Learning
• Supervised learning: classification is seen as supervised learning from
examples.
• Supervision: The data (observations, measurements, etc.) are labeled with
pre-defined classes. It is like that a “teacher” gives the classes (supervision).
• Test data are classified into these classes too.
• Unsupervised learning (clustering)
• Class labels of the data are unknown
• Given a set of data, the task is to establish the existence of classes or clusters
in the data
Difference between Classification and Clustering
Classification Clustering
• Classification is used in supervised learning technique • Clustering is used in unsupervised learning where similar
where predefined labels are assigned to instances by instances are grouped, based on their features or properties.
properties • Clustering is a technique of organising a group of data into
• Classification is the process of learning a model that classes and clusters where the objects reside inside a cluster
elucidate different predetermined classes of data. It is a will have high similarity and the objects of two clusters
two-step process, comprised of a learning step and would be dissimilar to each other.
a classification step. In learning step, a classification model • The main target of clustering is to divide the whole data into
is constructed and classification step the constructed multiple clusters. Unlike classification process, here the class
model is used to prefigure the class labels for given data. labels of objects are not known before, and clustering
• Classification Techniques: Decision Trees, KNN, Regression, pertains to unsupervised learning.
Naïve Bayes • In clustering, the similarity between two objects is
• Example: In a banking application, the customer who measured by the similarity function where the distance
applies for a loan may be classified as a safe and risky between those two object is measured. Shorter the distance
according to his/her age and salary. The produced model higher the similarity, conversely longer the distance higher
could be in the form of a decision tree or in a set of rules. the dissimilarity.
• Classification techniques : decision tree, neural networks, • Clustering Techniques: K Mean
logistic regression, etc. • Example: Customer Segmentation
Supervised learning process: two steps
 Learning (training): Learn a model using the training data
 Testing: Test the model using unseen test data to assess the
model accuracy

Number of correct classifications


Accuracy  ,
Total number of test cases

64
Machine Learning in Enterpris e Computing
Machine learning Machine learning

Train machine-learning model on historical data


Historical
Deploy the model to make predictions on new data Data

Regularly retrain the model with new data Training Process


Focus on making predictions about future data

New
Model Result
Data

Update by Retraining
Traditional rule-based approach vs. machine
learning
Fundamental assumption of learning
Assumption: The distribution of training examples is identical to the
distribution of test examples (including future unseen examples).

• In practice, this assumption is often violated to certain degree.


• Strong violations will clearly result in poor classification accuracy.
• To achieve good accuracy on the test data, training examples must
be sufficiently representative of the test data.
SUPERVISED LEARNING TECHNIQUES
• Regression:
• Linear Regression
• Ensemble Modelling
• Decision Trees
• Classification
• Naïve Bayes Classifier
• K Nearest
• Neural Networks
Regression (SUPERVISED LEARNING)

• Regression shows the relationship between certain variables.


• Regression models a target prediction value based on independent variables.
• It is mostly used for finding out the relationship between variables and
forecasting.
• Different regression models differ based on – the kind of relationship
between dependent and independent variables, they are considering and the
number of independent variables being used
• Applications: Financial forecasting, trend analysis, marketing, time series
prediction and even drug response modeling fraud detection, credit card
scoring and clinical trials
Decision Trees (SUPERVISED LEARNING)
• A decision tree can be used to visually and explicitly represent decisions and decision making.
• This approach generally works better with nonnumerical data.
• Decision Trees are excellent tools for helping choose between several courses of action.
• Decision making under uncertainty
• They provide a highly effective structure for laying out options and investigating the possible
outcomes of choosing those options.
• Also help form a balanced picture of the risks and rewards associated with each possible course
of action.
• Growing a tree involves deciding on which features to choose and what conditions to use for
splitting, along with knowing when to stop. 
• Used for creating Rules
• Applications: Customer Churn Analysis, Energy Consumption Patterns, Market Basket Analysis,
Fraudulent Practice, Sentiment Analysis, Investment Solutions
Example: Decision Tree
Age Root Node
Model process:
 A record in the query starts at the root node
>= 35 <35 Test
 A test (in the model) determines which node the
record should go to next
 All records end up in a leaf node
Buy
Income
100%
Interpreting the Results
<=$5000 >$5000 Decision
Read the tree from top to bottom
Node
Rule:
Credit If Age is less than 35 and
Won’t Buy
100% Rating Income is greater than $5000 and
Credit standing is Fair, then the customer has
a 35% chance of buying the product
Excellent Fair
Leaf Nodes
Age, then Income and credit rating, are the
most influential attributes determining
buying behavior.
Won’t Buy Buy
65% 35%
Ensemble Modelling ( SUPERVISED
LEARNING)
• Ensemble modeling is the process of running two or more related but
different analytical models and then synthesizing the results into a
single score or spread in order to improve the accuracy of predictive
analytics and data mining applications.
• A single model can have biases, high variability or inaccuracies
• Combining different models or analyzing multiple samples can reduce
the effects of those limitations and provide better information to
business decision makers.
• Even though this increases the complexity, this approach has been
shown to generate strong results.
Naïve Bayes Classification (SUPERVISED
LEARNING)
• Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem.
• It is“naïve” because the assumption is that each Feature is Independent and makes
and Equal contribution to the outcome.
• This may seem like a drawback but Naïve Bayes Classifier has proven to be quite
effective and fast to develop.
• The reason is that this approach is useful in classifying data based on key features
and patterns.
• They require a small amount of training data to estimate the necessary parameters.
• Applications: Text analysis. Examples email spam detection, customer
segmentation, sentiment analysis, medical diagnosis
K Nearest Neighbour(SUPERVISED
LEARNING)
• k-NN is a method for classifying a dataset (k represents the number of neighbors).
• The theory is that those values that are close together are likely to be good predictors for a model
• Calculates the distance between the nearest values
• The k-NN algorithm finds the k number of samples in the training which are nearer to the test
samples.
• In this method, three components play a key role: data samples, distance metric and number of
the neighbors i.e. k-value.
• For any classification task, initially, it computes the distance between the unlabeled data samples
and other labeled samples. Based on the computed distance, labeled data sample is assigned to
the nearest labeled sample.
• Numerical values: Euclidian distance
• Categorical data - Overlap metric (this is where the data is the same or very similar).
• Applications: credit score, image recognition
Artificial Neural Networks (Supervised
Learning)
• Neural net algorithms are based on how our brain processes information.
• In 1943 a neuroscientist and a logician developed the first conceptual model of an artificial neural
network.
• Neural net algorithms do NOT model how our brain works but they are inspired by how our brain works
and designed to solve certain kinds of problems.
• The human brain contains approximately 100 billion nerve cells called neurons.
• Each neuron is connected to thousands of other neurons and communicates with them through
electrochemical signals.
• Signals coming into a neuron are received via junctions called synapses which are located at the end of
branches of the neuron called dendrites. The neuron continuously receives signals from these inputs
and sums up its inputs in some way and then, if the end result is greater than some threshold value, the
neuron “fires”. It generates a voltage and outputs a signal along something called an axon.
• Since the output of the neuron is “fire” or “don’t fire” it is a binary output which can be imitated on a
computer easily
What is a Neural Network?
• We made a simple estimation function that takes in a set of inputs
and multiplies them by weights to get an output. Call this simple
function a neuron.
• By chaining lots of simple neurons together, we can model functions
that are too complicated to be modeled by one single neuron.
Artificial Neural Networks (Supervised
Learning)
• A neural network is a connectionist computational system.
• A true neural network does not follow a linear path but rather
information is processed collectively in parallel throughout a network
of nodes (neurons).
• Neural network algorithms are made up of many artificial neurons;
the number needed depends on how difficult the task is
• Each neuron can have multiple inputs but only a single output which
is binary.
Working of a Neural Network
• Initially the weights are guessed and then the algorithm adjusts them
during the training
• As each input enters the neuron its value is multiplied by its weight.
• These values are summed for all inputs.
• If the summed valued is >= threshold (such as 0 or 1) then it “fires"; i.e.,
it gives a positive output.
• If the summed valued is < threshold then it does NOT fire "; i.e., it gives a
negative output.
• If the output of the neuron matches the correct output in the training
set, then weights are not modified
Backpropagation
• Difficult of make adjustments to the weights in the model.
• Back propagation is an alternative… it’s about adjusting the neural
network when errors are found and then iterating the new values
through the neural network again
• Essentially, the process involves slight changes that continue to
optimize the model.
In other words, it’s easy to guess the next letter if we take into account
the sequence of letters that came right before it and combine that with
our knowledge of the rules of English.

To solve this problem with a neural network, we need to add state to


our model. Each time we ask our neural network for an answer, we also
save a set of our intermediate calculations and re-use them the next
time as part of our input. That way, our model will adjust its predictions
based on the input that it has seen recently.
What’s a single letter good for?
• Keeping track of state in our model makes it possible to not just
predict the most likely first letter in the story, but to predict the most
likely next letter given all previous letters. This is the basic idea of a
Recurrent Neural Network.
• One cool use might be auto-predict for a mobile phone keyboard.
• But what if we took this idea to the extreme? What if we asked the
model to predict the next most likely character over and over—
forever? We’d be asking it to write a complete story for us!
• We know that the idea of machine learning is that the
same generic algorithms can be reused with different
data to solve different problems. So let’s modify this
same neural network to recognize handwritten text.
But to make the job really simple, we’ll only try to
recognize one letter—the numeral “8”.
• Machine learning only works when you have data—
preferably a lot of data. So we need lots and lots of
handwritten “8”s to get started. Luckily, researchers
created the MNIST data set of handwritten numbers
for this very purpose. MNIST provides 60,000 images
of handwritten digits, each as an 18x18 image.
Machine learning only works when you have data — preferably a lot of
data. So we need lots and lots of handwritten “8”s to get started.

Some 8s from the MNIST data set


Luckily, researchers created the MNIST data set of handwritten
numbers for this very purpose. MNIST provides 60,000 images of
handwritten digits, each as an 18x18 image. Here are some “8”s from
the data set:

Some 8s from the MNIST data set

If you think about it, everything is just numbers
The neural network we made in Part 2 only took in a three numbers as
the input (“3” bedrooms, “2000” sq. feet , etc.). But now we want to
process images with our neural network. How in the world do we feed
• To a computer, an image is really just a grid of
numbers that represent how dark each pixel is:
To feed an image into our neural network, we simply treat the
18x18 pixel image as an array of 324 numbers:
To handle 324 inputs, we’ll just enlarge our neural network to have 324 input
nodes:
Training Data

Mmm… sweet, sweet training data
Clustering (Unsupervised Learning)
• Clustering is a technique for finding similarity groups in data, called clusters.
I.e.,
• it groups data instances that are similar to (near) each other in one cluster and data
instances that are very different (far away) from each other into different clusters.
• Clustering is often called an unsupervised learning task as no class values
denoting an a priori grouping of the data instances are given, which is the
case in supervised learning.
• Due to historical reasons, clustering is often considered synonymous with
unsupervised learning.
• In fact, association rule mining is also unsupervised
An illustration
• The data set has three natural groups of data points, i.e., 3
natural clusters.

99
What is clustering for?
Let us see some real-life examples
• Example 1: groups people of similar sizes together to make “small”,
“medium” and “large” T-Shirts.
• Tailor-made for each person: too expensive
• One-size-fits-all: does not fit all.
• Example 2: In marketing, segment customers according to their
similarities
• To do targeted marketing.
What is clustering for?
• Example 3: Given a collection of text documents, we want to organize
them according to their content similarities,
• To produce a topic hierarchy
• In fact, clustering is one of the most utilized data mining techniques.
• It has a long history, and used in almost every field, e.g., medicine,
psychology, botany, sociology, biology, archeology, marketing, insurance,
libraries, etc.
• In recent years, due to the rapid increase of online documents, text clustering
becomes important.
Hierarchical Clustering
• Agglomerative Hierarchical clustering Technique: In this technique,
initially each data point is considered as an individual cluster. At each
iteration, the similar clusters merge with other clusters until one
cluster or K clusters are formed.
• The basic algorithm of Agglomerative is straight forward.
• Compute the proximity matrix
• Let each data point be a cluster
• Repeat: Merge the two closest clusters and update the proximity matrix
• Until only a single cluster remains
• Key operation is the computation of the proximity of two clusters
Hierarchical Clustering
Hierarchical Clustering Visualization using
Dendrogram
K-Means Clustering (Unsupervised Learning Clustering)

• The k-Means clustering algorithm, which is effective for large datasets, puts similar, unlabeled
data into different groups.
• The first step is to select k, which is the number of clusters; generally by visualizations of that
data to see if there are noticeable grouping areas.

• Works with numeric data only!

Algorithm:
• Pick a number k of random cluster centers
• Assign every item to its nearest cluster center using a distance metric
• Move each cluster center to the mean of its assigned items
• Repeat 2-3 until convergence (change in cluster assignment less than a threshold)
• How to select K in a complex data set?
• Experiment with different k values and then measure the average distances.
By doing this multiple times, there should be more accuracy.
• Why not have a high number for k? When we compute the average,
after a point there will be only incremental improvements. Stop at the
point where this starts to occur.
• SPSS offers 3 types of Clustering:
• 2 Step Clustering: Really large Data Sets
• K Means – Moderate Data Sets
• Hierarchical – Small Data sets

• 2-Step Clustering- combination of K Means and Hierarchical


• Step 1: Preclustering: Making Little Clusters :
• The first step of the two-step procedure is formation of preclusters. The goal of preclustering is to
reduce the size of the matrix that contains distances between all possible pairs of cases.
• Preclusters are just clusters of the original cases that are used in place of the raw data in the
hierarchical clustering.
• As a case is read, the algorithm decides, based on a distance measure, if the current case should be
merged with a previously formed precluster or start a new precluster.
• When preclustering is complete, all cases in the same precluster are treated as a single entity

• Step 2: Hierarchical Clustering of Preclusters


• In the second step, SPSS uses the standard hierarchical clustering algorithm on the preclusters.
• Forming clusters hierarchically lets you explore a range of solutions with different numbers of clusters.
Association
• Association rules help establish associations amongst data objects
inside large databases.
• This unsupervised technique is about discovering interesting
relationships between variables in large databases. For example,
people that buy a new home most likely to buy new furniture.
• Other Examples:
• A subgroup of cancer patients grouped by their gene expression
measurements
• Groups of shopper based on their browsing and purchasing histories
• Movie group by the rating given by movies viewers
Reinforcement Learning
• Based on how you learned to play a game… learning through trial-and-error process, where learning was improved
based on positive and negative reinforcement.
• Reinforcement learning (RL) is an area of machine learning, concerned with how software agents ought to take
actions in an environment so as to maximize some notion of cumulative reward. 
• Reinforcement learning is a way to train a model by rewarding accurate predictions and punishing those that are
not.

Some applications:
• Games:
• They are ideal for reinforcement learning since there are clear-cut rules, scores, and various constraints (like a game board).
• Can be tested with millions of simulations
• Used in Go, Chess
• Robotics:
• A key is being able to navigate within a space
• Requires evaluating the environment at many different points.
• If it navigates successfully, there is positive reinforcement… If it runs into things, there will be a negative reinforcement action.
Applications of Reinforcement Learning…
• Resources management in computer clusters: Designing algorithms to
automatically allocate and schedule computer resources to waiting
jobs, with the objective to minimize the average job slowdown
• Traffic Light Control
• Chemistry : optimizing chemical reactions.
• Personalized Recommendations
• Bidding and Advertising. Alibaba Group has shown how Multi Agent
Reinforcement Learning can be used in multi-agent bidding solution
(DCMAB). Example: Taobao ad platform is a place for merchants to
place a bid in order to display ad to the customers.
Some Innovations driven by Reinforcement
Learning
• Osaro:
• The company develops systems that allow robots to learn quickly.
• Osaro describes this as “the ability to mimic behavior that requires learned sensor
fusion as well as high level planning and object manipulation. It will also enable the
ability to learn from one machine to another and improve beyond a human
programmer’s insights.”
• For example, one of its robots was able to learn, within only five seconds, how to lift
and place a chicken (the system is expected to be used in poultry factories).
• OpenAI:
• have created the Dactyl, which is a robot hand that has human-like dexterity
• One of the surprising results was that the system learned human hand actions that
were not preprogrammed—such as sliding of the finger.
Some Innovations driven by Reinforcement
Learning
• Google:
• Beginning in 2013, the company went on an M&A (mergers and acquisitions) binge for robotics companies but was not very successful.
• Google has focused on pursuing simpler robots that are driven by AI and the company has created a new division, called Robotics at
Google.
• For example, one of the robots can look at a bin of items and identify the one that is requested—picking it up with a three-fingered
hand— about 85% of the time. A typical person, on the other hand, was able to do this at about 80%.
• COBOTS …These are robots that work along with people… a much more powerful approach, as there can be
leveraging of the advantages of both machines and humans.

• Amazon:
• In 2012, the company shelled out $775 million for Kiva, a top industrial robot manufacturer.
• Since then, Amazon.com has rolled out about 100,000 systems across more than 25 fulfillment centers (because of this, the company
has seen 40% improvement in inventory capacity).
• Amazon Robotics automates fulfilment center operations using various methods of robotic technology including autonomous mobile
robots, sophisticated control software, language perception, power management, computer vision, depth sensing, machine learning,
object recognition, and semantic understanding of commands.
• Within the warehouses, robots quickly move across the floor helping to locate and lift storage pods. But people are also critical as they
are better able to identify and pick individual products.
Re info rc e me nt Le arning
Applications of machine learning

ML applic atio ns fall into thre e bro ad c o nte xts


▪Supervised learning
– Dataset + labels/annotations
UNS UPERVIS ED S UPERVIS ED
▪Unsupervised learning
– Dataset (without labels/annotations)
▪Reinforcement learning ML
Applic atio ns
– No initial dataset
▫ Dataset accumulated with experience
▫ ML agents interact with environment (trial and error)

REINFORCEMENT

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Re info rc e me nt Le arning
Machine Learning – Reinforcement Learning Agent

RL: Datas e t built with e xpe rie nc e


▪Experience = Env. State + Action + Next Env. State + Reward

RL Fe e dbac k Lo o p
▪At each step the ag e nt
– Executes action: 𝐴𝑡
– Receives observation: 𝑂𝑡
– Receives reward: 𝑅𝑡
▪The e nviro nme nt
– Receives action: 𝐴𝑡
– Emits observation: 𝑂𝑡+1
– Emits reward: 𝑅𝑡+1

Environment
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6

You might also like