You are on page 1of 33

Introduction to Machine Learning

Machine learning is a core sub-area of artificial intelligence; it enables computers to get


into a mode of self-learning without being explicitly programmed. When exposed to new
data, these computer programs are enabled to learn, grow, change, and develop by
themselves.
 
To better understand the uses of machine learning consider some of the instances where
machine learning is applied: the self-driving Google car, cyber fraud detection, online
recommendation engines—like friend suggestions on Facebook, Netflix showcasing the
movies and shows you might like, and “more items to consider” and “get yourself a little
something” on Amazon—are all examples of applied machine learning.

All these examples echo the vital role machine learning has begun to take in today’s data-
rich world. Machines can aid in filtering useful pieces of information that help in major
advancements, and we are already seeing how this technology is being implemented in a
wide variety of industries.
Machine Learning Process:
 
The process flow depicted here represents how machine learning works
 
 
Training data can be a dataset or data generated from a database. Before making a machine
to learn we need to pre process the data so that any abnormalities like missing rows,
columns or data can be removed for making it easy to learn by the machine. After pre-
processing the data we can make the machine to learn the data by making use of machine
learning algorithms which where categorized as
1. Supervised Learning
2. Unsupervised Learning
3. Semi Supervised Learning
4. Reinforcement Learning.
 
Supervised Learning:
The majority of practical machine learning uses supervised learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and
you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input
data (x) that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the
training dataset can be thought of as a teacher supervising the learning process. We know
the correct answers, the algorithm iteratively makes predictions on the training data and is
corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of
performance.

Supervised learning problems can be further grouped into regression and classification
problems.

 Classification: A classification problem is when the output variable is a category, such as


“red” or “blue” or “disease” and “no disease”.
 Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
 
Some popular examples of supervised machine learning algorithms are:

 Linear regression for regression problems.


 Random forest for classification and regression problems.
 Support vector machines for classification problems.
 
Unsupervised Machine Learning
Unsupervised learning is where you only have input data (X) and no corresponding output
variables.

The goal for unsupervised learning is to model the underlying structure or distribution in
the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning above there is
no correct answers and there is no teacher. Algorithms are left to their own devises to
discover and present the interesting structure in the data.

Unsupervised learning problems can be further grouped into clustering and association
problems.

 Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
 Association:  An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Some popular examples of unsupervised learning algorithms are:

 k-means for clustering problems.


 Apriori algorithm for association rule learning problems.
 

Semi-Supervised Machine Learning


Problems where you have a large amount of input data (X) and only some of the data is
labeled (Y) are called semi-supervised learning problems.

These problems sit in between both supervised and unsupervised learning.

A good example is a photo archive where only some of the images are labeled, (e.g. dog,
cat, person) and the majority are unlabeled.

Many real world machine learning problems fall into this area. This is because it can be
expensive or time-consuming to label data as it may require access to domain experts.
Whereas unlabeled data is cheap and easy to collect and store.

You can use unsupervised learning techniques to discover and learn the structure in the
input variables.
You can also use supervised learning techniques to make best guess predictions for the
unlabeled data, feed that data back into the supervised learning algorithm as training data
and use the model to make predictions on new unseen data.

Problem Definition
Wednesday, April 18, 2018
12:16 PM

Problem Definition:

We are taking the Kostecki Dillon Migraine Dataset which consists of headache logs kept by 133
patients in a treatment program in which bio-feedback was used to attempt to reduce migraine
frequency and severity. Patients entered the program at different times over a period of about 3
years. Patients were encouraged to begin their logs four weeks before the onset of treatment and
to continue for one month afterwards, but only 55 patients have data preceding the onset of
treatment.
 
Taking this dataset into consideration we train our algorithm to predict the probabililty of
migraine severity and frequency for future data which we are going to give as input.
 
There are different fields which where available in the dataset like
1. patient id
2. time in days of the treatment
3. dos-time in days at the start of the study
4. hatype-Aura or No Aura a type of migraine attack
5. age
6. airq-air quality
7. medication-none, reduced and continuing are the values which represents the current
medication taken by the patient
8. headache-yes or no are the values representing whether the patient is suffering from
headache or not
9. Sex-male or female

Classification:
Classification is a machine learning method that uses data to determine the category, type,
or class of an item or row of data. For example, you can use classification to:
 Classify email filters as spam, junk, or good.
 Determine whether a patient's lab sample is cancerous.
 Categorize customers by their propensity to respond to a sales campaign.
 Identify sentiment as positive or negative.

Classification tasks are frequently organized by whether a classification is binary (either A


or B) or multiclass (multiple categories that can be predicted by using a single model).
 
List of Classification Algorithms in Azure Machine Learning Studio:
 
 Multiclass Decision Forest: Creates a multiclass classification model by using the
decision forest algorithm.
 Multiclass Decision Jungle: Creates a multiclass classification model by using the
decision jungle algorithm.
 Multiclass Logistic Regression: Creates a multiclass logistic regression classification
model.
 Multiclass Neural Network: Creates a multiclass classification model by using a neural
network algorithm.
 One-vs-All Multiclass: Creates a multiclass classification model from an ensemble of
binary classification models.
 Two-Class Averaged Perceptron: Creates an averaged perceptron binary classification
model.
 Two-Class Bayes Point Machine: Creates a Bayes point machine binary classification
model.
 Two-Class Boosted Decision Tree: Creates a binary classifier by using a boosted decision
tree algorithm.
 Two-Class Decision Forest: Creates a two-class classification model by using the
decision forest algorithm.
 Two-Class Decision Jungle: Creates a two-class classification model by using the
decision jungle algorithm.
 Two-Class Locally Deep Support Vector Machine: Creates a binary classification model
by using the locally deep Support Vector Machine algorithm.
 Two-Class Logistic Regression: Creates a two-class logistic regression model.
 Two-Class Neural Network: Creates a binary classifier by using a neural network
algorithm.
 Two-Class Support Vector Machine: Creates a binary classification model by using the
Support Vector Machine algorithm.

Reasons for Choosing Classification Algorithm


In the given problem definition we are having patients of different ages and belonging to
different sex.
A classification algorithm is used for making the machine to learn the data based on a particular
type, category or label of data present in the dataset. For the problem given we want to classify
the data based on the age of the patient to find the probability of patient being suffering from
migraine. We can also make use of clustering or regression algorithms but we are using
classification algorithm since the data In the dataset continuous and linear and classification is
the best model for continuous and linear data we are choosing a classification algorithm.

Choosing a better Classification Algorithm for our


problem
Wednesday, April 18, 2018
12:28 PM
Out of the list of azure classification algorithms listed in the classification section which
algorithm we need to choose for making the machine to learn our dataset for future prediction?
 
There is no single algorithm that is better than all the others on all the problems. After analyzing
our problem in hand with different classification algorithms we found that Multi Class Logistic
Regression Algorithm is best for our chosen problem. We compared all the algorithms in terms
of scored labels and Multi class Logistic regression stood as the best.
 
The comparision is made by taking the scored labels of all algorithms and comparing them with
each other in power bi. Power Bi is a microsoft tool which is used for making analytics simpler
for business intelligence.
 

 
The chart above shows the comparison sheet of the classification algorithms against our problem
definition based on scored labels.

Scored Labels And Scored Probabilities


Wednesday, April 18, 2018
12:56 PM
Scored Labels are nothing but the calculated result which indicates what the algorithm has
calculated. For our problem definition scored label indicates the age of the patients who are
being effected by migraine attack based on the given input data to our trained model.
 
Scored Probabilties is a real value which indicates the calculated probability. For our problem
definition it indicates the probability of a patient being effected by migraine attack based on the
given input data to our trained model.

Notes on Multi Class Logisitic Regression


Wednesday, April 18, 2018
1:06 PM
Logistic regression is a well-known method in statistics that is used to predict the probability of
an outcome, and is particularly popular for classification tasks. The algorithm predicts the
probability of occurrence of an event by fitting data to a logistic function. In multiclass logistic
regression, the classifier can be used to predict multiple outcomes.
 

Logistic regression requires numeric variables. Therefore, when you try to use categorical
columns as a variable, Azure Machine Learning converts the values to an indicator array
internally.

For dates and times, a numeric representation is used. For more information about date time
values. If you want to handle dates and times differently we suggest that you create a derived
column.

Standard logistic regression is binomial and assumes two output classes. Multiclass or
multinomial logistic regression assumes three or more output classes.

Binomial logistic regression assumes a logistic distribution of the data, where the probability that
an example belongs to class 1 is the formula:
p(x;β0,…, βD-1)

Where:

 x is a D-dimensional vector containing the values of all the features of the instance.

 p is the logistic distribution function.

 β{0},..., β {D-1} are the unknown parameters of the logistic distribution.


The algorithm tries to find the optimal values for β{0},..., β {D-1} by maximizing the log
probability of the parameters given the inputs. Maximization is performed by using a popular
method for parameter estimation, called Limited Memory BFGS.

Steps to Configure Multi Class Logistic Regression


Wednesday, April 18, 2018
1:14 PM
1. Add the dataset to our experiment. We can add saved dataset or we can take or download
the datasets available in dataset repositories like UCI.

 
2. We need to select the columns in the dataset which are to be used for learning by the
algorithm. Go to Data Transformation and select Manipulation panel and drag the panel
with the name as select columns in dataset. Give the connection between the dataset and
select columns from dataset.
 
 
3. In the properties window of select columns in dataset click on launch column selector.
We will get a new window with the name as select columns in which all the columns
available in our dataset will appear. Select the columns based on their datatype as
numeric as our algorithm is specific for numeric data and add the columns to the column
panel by clicking on > arrow.
 

 
 

 
 
 
 
4. Drag Edit Metadata panel onto our experiment. At the right hand side in the properties
window click on launch column selector to get a new window with the name as select
columns. In that select column types as numeric.
 
 
 

 
 
5. In Data Transformation -Sample and Split panel select split data for splitting the data
in our dataset. Connect it with output port of Edit Metadata
 
 

 
The properties of split data panel are as follows:
 
 
6. Add the Multiclass Logistic Regression module to the experiment from Machine
Learning-Initialize Model-Classification.
 

 
 

7. Specify how you want the model to be trained, by setting the Create trainer mode
option.

o Single Parameter: Use this option if you know how you want to configure the
model, and provide a specific set of values as arguments.

o Parameter Range: Use this option if you are not sure of the best parameters, and
want to use a parameter sweep.

8. Optimization tolerance, specify the threshold value for optimizer convergence. If the
improvement between iterations is less than the threshold, the algorithm stops and returns
the current model.

9. L1 regularization weight, L2 regularization weight: Type a value to use for the


regularization parameters L1 and L2. A non-zero value is recommended for both.
Regularization is a method for preventing overfitting by penalizing models with extreme
coefficient values. Regularization works by adding the penalty that is associated with
coefficient values to the error of the hypothesis. An accurate model with extreme
coefficient values would be penalized more, but a less accurate model with more
conservative values would be penalized less.
L1 and L2 regularization have different effects and uses. L1 can be applied to sparse
models, which is useful when working with high-dimensional data. In contrast, L2
regularization is preferable for data that is not sparse. This algorithm supports a linear
combination of L1 and L2 regularization values: that is, if x = L1 and y = L2, ax + by = c
defines the linear span of the regularization terms.
Different linear combinations of L1 and L2 terms have been devised for logistic regression
models, such as elastic net regularization.

10. Memory size for L-BFGS: Specify the amount of memory to use for L-BFGS
optimization. This parameter indicates the number of past positions and gradients to store
for the computation of the next step.
L-BFGS stands for limited memory Broyden-Fletcher-Goldfarb-Shanno, and it is an
optimization algorithm that is popular for parameter estimation. This optimization
parameter limits the amount of memory that is used to compute the next step and direction.
When you specify less memory, training is faster but less accurate.

11. Random number seed: Type an integer value to use as the seed for the algorithm if you
want the results to be repeatable over runs. Otherwise, a system clock value is used as the
seed, which can produce slightly different results in runs of the same experiment.

12. Allow unknown categorical levels: Select this option to create an additional “unknown”
level in each categorical column. Any values (levels) in the test dataset that are not present
in the training dataset are mapped to this "unknown" level.

 
13. For training our model we need to click on Train Model in the Train panel and drag it
onto our experiment. Link the Train Model with Output ports of Multi Class Logistic
Regression and Split Data. In the properties window of Train Model click on launch
column selector to get Select a Single Column Window in which select column name as
age on which we need to train our dataset.

 
 

 
 

14. From Score Panel drag the Score Model to our experiment to get the probabilistic
results. Connect output port of Train model and Split data to our Score Model.

 
15. Now Run the Experiment by clicking on Run.
16. Right click on Score Model and in Scored Dataset Click on visualize to see the results
generated by our trained experiment.

 
 
 
 
 
 
Steps for setting a web service for our trained model
Wednesday, April 18, 2018
3:06 PM
1. After successful training of our algorithm we need to set our trained model as a web
service by clicking on Set Up A Web Service option.
2. In this option we need to select Retraining Web Service to make our trained model
ready to be deployed as a web service with Web Service Input and Web Service Output.
 
 
 
 
3. Run the Retrained Web Service by clicking on Run
 
4. After getting out Trained Experiment ready we need to get the Predictive experiment
which will make the predictions for our future data by clicking on Predictiive Web
Service in Set Up Web Service.
 
 
 
5. Now run the Predictive Experiment to make the experiment ready for making
predictions by clicking on Run.
 
 

Steps for deploying web service


Wednesday, April 18, 2018
3:21 PM
1. After setting up the web service by running our Predictive Experiment we can deploy our
experiment by clicking on Deploy Web Service.
 
 
 
 

 
 

You might also like