4 views

Uploaded by linkranjit

ML

- An Approach to Develop Expert Systems in Medical Diagnosis Using Machine Learning Algorithms (Asthma) and A Performance Study
- Time Series Forecasting With Feed-Forward Neural Networks:
- The Application of Data Mining to Build Classification Model for Predicting Graduate Employment
- Artificial Neural Networks & Intelligence
- FUZZY NEURAL NETWORKS FOR IDENTIFICATION AND CONTROL OF DC DRIVE SYSTEMS
- ann_qb
- 7-f10086
- Neural Networks
- Ann
- decision trees
- Development of Back Propagation Neural Network Model to Predict Performance and Emission
- A Comparison of Two Learning Algorithms 64501
- 04012019
- Exercise
- Forecasting of Suspended Sediment in Rivers Using Artificial Neural Networks Approach
- Classification
- 05968251
- Face Regconition Sample
- 1
- Open23

You are on page 1of 45

There are so many algorithms available and it can feel overwhelming when algorithm names are thrown

around and you are expected to just know what they are and where they fit.

In this post I want to give you two ways to think about and categorize the algorithms you may come

across in the field.

The second is a grouping of algorithms by similarity in form or function (like grouping similar animals

together).

Both approaches are useful, but we will focus in on the grouping of algorithms by similarity and go on a

tour of a variety of different algorithm types.

After reading this post, you will have a much better understanding of the most popular machine learning

algorithms for supervised learning and how they are related.

A cool example of an ensemble of lines of best fit. Weak members are grey, the combined prediction is

red.

Plot from Wikipedia, licensed under public domain.

There are different ways an algorithm can model a problem based on its interaction with the experience

or environment or whatever we want to call the input data.

It is popular in machine learning and artificial intelligence textbooks to first consider the learning styles

that an algorithm can adopt.

There are only a few main learning styles or learning models that an algorithm can have and we’ll go

through them here with a few examples of algorithms and problem types that they suit.

This taxonomy or way of organizing machine learning algorithms is useful because it forces you to think

about the the roles of the input data and the model preparation process and select one that is the most

appropriate for your problem in order to get the best result.

Let’s take a look at four different learning styles in machine learning algorithms:

Supervised Learning

Input data is called training data and has a known label or result

such as spam/not-spam or a stock price at a time.

A model is prepared through a training process where it is required to make predictions and is corrected

when those predictions are wrong. The training process continues until the model achieves a desired

level of accuracy on the training data.

Example problems are classification and regression.

Example algorithms include Logistic Regression and the Back Propagation Neural Network.

Unsupervised Learning

Input data is not labelled and does not have a known result.

A model is prepared by deducing structures present in the input data. This may be to extract general

rules. It may through a mathematical process to systematically reduce redundancy, or it may be

to organize data by similarity.

Example problems are clustering, dimensionality reduction and association rule learning.

Semi-Supervised Learning

Input data is a mixture of labelled and unlabelled examples.

There is a desired prediction problem but the model must learn the structures to organize the data as well

as make predictions.

Example algorithms are extensions to other flexible methods that make assumptions about how to model

the unlabelled data.

Overview

When crunching data to model business decisions, you are most typically using supervised and

unsupervised learning methods.

A hot topic at the moment is semi-supervised learning methods in areas such as image classification

where there are large datasets with very few labelled examples.

Algorithms are often grouped by similarity in terms of their function (how they work). For example, tree-

based methods, and neural network inspired methods.

I think this is the most useful way to group algorithms and it is the approach we will use here.

This is a useful grouping method, but it is not perfect. There are still algorithms that could just as easily fit

into multiple categories like Learning Vector Quantization that is both a neural network inspired method

and an instance-based method. There are also categories that have the same name that describes the

problem and the class of algorithm such as Regression and Clustering.

We could handle these cases by listing algorithms twice or by selecting the group that subjectively is the

“best” fit. I like this latter approach of not duplicating algorithms to keep things simple.

In this section I list many of the popular machine leaning algorithms grouped the way I think is the most

intuitive. It is not exhaustive in either the groups or the algorithms, but I think it is representative and will

be useful to you to get an idea of the lay of the land.

Please Note: There is a strong bias towards algorithms used for classification and regression, the two

most prevalent supervised machine learning problems you will encounter.

If you know of an algorithm or a group of algorithms not listed, put it in the comments and share it with us.

Let’s dive in.

Regression Algorithms

variables that is iteratively refined using a measure of error in the predictions made by the model.

Regression methods are a workhorse of statistics and have been cooped into statistical machine learning.

This may be confusing because we can use regression to refer to the class of problem and the class of

algorithm. Really, regression is a process.

Linear Regression

Logistic Regression

Stepwise Regression

Multivariate Adaptive Regression Splines (MARS)

Locally Estimated Scatterplot Smoothing (LOESS)

Instance-based Algorithms

Instance based learning model a decision problem with instances or

examples of training data that are deemed important or required to the model.

Such methods typically build up a database of example data and compare new data to the database

using a similarity measure in order to find the best match and make a prediction. For this reason,

instance-based methods are also called winner-take-all methods and memory-based learning. Focus is

put on representation of the stored instances and similarity measures used between instances.

Learning Vector Quantization (LVQ)

Self-Organizing Map (SOM)

Locally Weighted Learning (LWL)

Regularization Algorithms

An extension made to another method (typically regression methods)

that penalizes models based on their complexity, favoring simpler models that are also better at

generalizing.

I have listed regularization algorithms separately here because they are popular, powerful and generally

simple modifications made to other methods.

Ridge Regression

Least Absolute Shrinkage and Selection Operator (LASSO)

Elastic Net

Least-Angle Regression (LARS)

Decision Tree Algorithms

on actual values of attributes in the data.

Decisions fork in tree structures until a prediction decision is made for a given record. Decision trees are

trained on data for classification and regression problems. Decision trees are often fast and accurate and

a big favorite in machine learning.

Iterative Dichotomiser 3 (ID3)

C4.5 and C5.0 (different versions of a powerful approach)

Chi-squared Automatic Interaction Detection (CHAID)

Decision Stump

M5

Conditional Decision Trees

Bayesian Algorithms

Theorem for problems such as classification and regression.

Naive Bayes

Gaussian Naive Bayes

Multinomial Naive Bayes

Averaged One-Dependence Estimators (AODE)

Bayesian Belief Network (BBN)

Bayesian Network (BN)

Clustering Algorithms

class of methods.

Clustering methods are typically organized by the modelling approaches such as centroid-based and

hierarchal. All methods are concerned with using the inherent structures in the data to best organize the

data into groups of maximum commonality.

k-Means

k-Medians

Expectation Maximisation (EM)

Hierarchical Clustering

Association rule learning are methods that extract rules that best

explain observed relationships between variables in data.

These rules can discover important and commercially useful associations in large multidimensional

datasets that can be exploited by an organisation.

Apriori algorithm

Eclat algorithm

Artificial Neural Networks are models that are inspired by the structure

and/or function of biological neural networks.

They are a class of pattern matching that are commonly used for regression and classification problems

but are really an enormous subfield comprised of hundreds of algorithms and variations for all manner of

problem types.

Note that I have separated out Deep Learning from neural networks because of the massive growth and

popularity in the field. Here we are concerned with the more classical methods.

Perceptron

Back-Propagation

Hopfield Network

Radial Basis Function Network (RBFN)

Deep Learning Algorithms

Networks that exploit abundant cheap computation.

They are concerned with building much larger and more complex neural networks, and as commented

above, many methods are concerned with semi-supervised learning problems where large datasets

contain very little labelled data.

Deep Belief Networks (DBN)

Convolutional Neural Network (CNN)

Stacked Auto-Encoders

Dimensionality Reduction Algorithms

the inherent structure in the data, but in this case in an unsupervised manner or order to summarise or

describe data using less information.

This can be useful to visualize dimensional data or to simplify data which can then be used in a

supervized learning method. Many of these methods can be adapted for use in classification and

regression.

Principal Component Regression (PCR)

Partial Least Squares Regression (PLSR)

Sammon Mapping

Multidimensional Scaling (MDS)

Projection Pursuit

Linear Discriminant Analysis (LDA)

Mixture Discriminant Analysis (MDA)

Quadratic Discriminant Analysis (QDA)

Flexible Discriminant Analysis (FDA)

Ensemble Algorithms

that are independently trained and whose predictions are combined in some way to make the overall

prediction.

Much effort is put into what types of weak learners to combine and the ways in which to combine them.

This is a very powerful class of techniques and as such is very popular.

Boosting

Bootstrapped Aggregation (Bagging)

AdaBoost

Stacked Generalization (blending)

Gradient Boosting Machines (GBM)

Gradient Boosted Regression Trees (GBRT)

Random Forest

Machine Learning with R

R is the most popular platform for applied machine learning. When you want to get serious with applied

machine learning you will find your way into R.

It is very powerful because so many machine learning algorithms are provided. A problem is that the

algorithms are all provided by third parties, which makes their usage very inconsistent. This slows you

down, a lot, because you have to learn how to model data and how to make predicts with each algorithm

in each package, again and again.

In this post you will discover how you can overcome this difficulty with machine learning algorithms in R,

with pre-prepared recipes that follow a consistent structure.

The R ecosystem is enormous. Open source third party packages provide this power, allowing academics

and professionals to get the most powerful algorithms available into the hands of us practitioners.

A problem that I experienced when starting out with R was that the usage to each algorithm differs from

package to package. This inconsistency also extends to the documentation, with some providing worked

example for classification but ignoring regression and others not providing examples at all.

All this means that if you want to try a few different algorithms from different packages, you must spend

time figuring out how to fit and make predictions with each method in turn. This takes a lot of time,

especially with the spotty examples and vignettes.

Inconsistent: Algorithm implementations vary in the way a model is fit to data and the way a model is

used to generate predictions. This means that you have to study each package and each algorithm

implementation just to put together a working example, let alone adapt it to your problem.

Decentralized: Algorithm are implemented across different packages and it can be hard to locate which

packages provide an implementation of the algorithm you need, let alone which package provides the

most popular implementation. Additionally, the documentation for one package may be spread across

multiple help files, website and vignettes. This means you have to do a lot of searching just to locate an

algorithm, let alone compile a list of algorithms from which you can choose.

Incomplete: Algorithm documentation is almost always partially complete. An example usage may or

may not be provided, if it is, it may or may not be demonstrated on a canonical problem. This means you

have no obvious way to quickly understand how to use an implementation.

Complexity: Algorithms vary in their complexity of implementation and description. This can take it’s toll

on you as you jump from package to package. You want to focus on how to get the most from the

algorithm and its parameters, and not burn energy on parsing reams of PDFs just to get a hello world.

Linear Regression in R

You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and

practice with linear regression in R.

Ordinary Least Squares Regression

Ordinary Least Squares (OLS) regression is a linear model that seeks to find a set of coefficients for a

line/hyper-plane that minimise the sum of the squared errors.

# load data

data(longley)

# fit model

fit<- lm(Employed~., longley)

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Stepwize Linear Regression

Stepwise Linear Regression is a method that makes use of linear regression to discover which subset of

attributes in the dataset result in the best performing model. It is step-wise because each iteration of the

method makes a change to the set of attributes and creates a model to evaluate the performance of the

set.

# load data

data(longley)

# fit model

base<- lm(Employed~., longley)

# summarize the fit

summary(base)

# perform step-wise feature selection

fit<- step(base)

# summarize the selected model

summary(fit)

# make predictions

predictions<- predict(fit, longley)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Principal Component Regression (PCR) creates a linear regression model using the outputs of a Principal

Component Analysis (PCA) to estimate the coefficients of the model. PCR is useful when the data has

highly-correlated predictors.

library(pls)

# load data

data(longley)

# fit model

fit<- pcr(Employed~., data=longley, validation="CV")

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley, ncomp=6)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Partial Least Squares (PLS) Regression creates a linear model of the data in a transformed projection of

problem space. Like PCR, PLS is appropriate for data with highly-correlated predictors.

library(pls)

# load data

data(longley)

# fit model

fit<- plsr(Employed~., data=longley, validation="CV")

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley, ncomp=6)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Multivariate Adaptive Regression Splines (MARS) is a non-parametric regression method that models

multiple nonlinearities in data using hinge functions (functions with a kink in them).

# load the package

library(earth)

# load data

data(longley)

# fit model

fit<- earth(Employed~., longley)

# summarize the fit

summary(fit)

# summarize the importance of input variables

evimp(fit)

# make predictions

predictions<- predict(fit, longley)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Support Vector Machines (SVM) are a class of methods, developed originally for classification, that find

support points that best separate classes. SVM for regression is called Support Vector Regression

(SVM).

library(kernlab)

# load data

data(longley)

# fit model

fit<- ksvm(Employed~., longley)

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

k-Nearest Neighbor

The k-Nearest Neighbor (kNN) does not create a model, instead it creates predictions from close data on-

demand when a prediction is required. A similarity measure (such as Euclidean distance) is used to

locate close data in order to make predictions.

library(caret)

# load data

data(longley)

# fit model

fit<- knnreg(longley[,1:6], longley[,7], k=3)

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Neural Network

A Neural Network (NN) is a graph of computational units that recieve inputs and transfer the result into an

output that is passed on. The units are ordered into layers to connect the features of an input vector to the

features of an output vector. With training, such as the Back-Propagation algorithm, neural networks can

be designed and trained to model the underlying relationship in data.

library(nnet)

# load data

data(longley)

x <- longley[,1:6]

y <- longley[,7]

# fit model

fit<- nnet(Employed~., longley, size=12, maxit=500, linout=T, decay=0.01)

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, x, type="raw")

# summarize accuracy

rmse<- mean((y - predictions)^2)

print(rmse)

Classification and Regression Trees

Classification and Regression Trees (CART) split attributes based on values that minimize a loss function,

such as sum of squared errors.

The following recipe demonstrates the recursive partitioning decision tree method on the longley dataset.

library(rpart)

# load data

data(longley)

# fit model

fit<- rpart(Employed~., data=longley, control=rpart.control(minsplit=5))

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Condition Decision Trees are created using statistical tests to select split points on attributes rather than a

loss function.

The following recipe demonstrates the condition inference trees method on the longley dataset.

library(party)

# load data

data(longley)

# fit model

fit<- ctree(Employed~., data=longley, controls=ctree_control(minsplit=2,minbucket=2,testtype="Univariate"))

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Model Trees

Model Trees create a decision tree and use a linear model at each node to make a prediction rather than

using an average value.

The following recipe demonstrates the M5P Model Tree method on the longley dataset.

Rule System

Rule Systems can be created by extracting and simplifying the rules from a decision tree.

The following recipe demonstrates the M5Rules Rule System on the longley dataset.

library(RWeka)

# load data

data(longley)

# fit model

fit<- M5Rules(Employed~., data=longley)

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Bagging CART

Bootstrapped Aggregation (Bagging) is an ensemble method that creates multiple models of the same

type from different sub-samples of the same dataset. The predictions from each separate model are

combined together to provide a superior result. This approach has shown participially effective for high-

variance methods such as decision trees.

The following recipe demonstrates bagging applied to the recursive partitioning decision tree.

library(ipred)

# load data

data(longley)

# fit model

fit<- bagging(Employed~., data=longley, control=rpart.control(minsplit=5))

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Random Forest

Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a

tree at each decision point to a random sub-sample. This further increases the variance of the trees and

more trees are required.

library(randomForest)

# load data

data(longley)

# fit model

fit<- randomForest(Employed~., data=longley)

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Boosting is an ensemble method developed for classification for reducing bias where models are added

to learn the misclassification errors in existing models. It has been generalized and adapted in the form

of Gradient Boosted Machines (GBM) for use with CART decision trees for classification and regression

library(gbm)

# load data

data(longley)

# fit model

fit<- gbm(Employed~., data=longley, distribution="gaussian")

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley)

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

Cubist

Cubist decision trees are another ensemble method. They are constructed like model trees but involve a

boosting-like procedure called committees that re rule-like models.

library(Cubist)

# load data

data(longley)

# fit model

fit<- cubist(longley[,1:6], longley[,7])

# summarize the fit

summary(fit)

# make predictions

predictions<- predict(fit, longley[,1:6])

# summarize accuracy

rmse<- mean((longley$Employed - predictions)^2)

print(rmse)

In the picture, we used the following definitions for the notations:

Θ(j)Θ(j) : matrix of weights controlling function mapping from layer jj to layer j+1j+1

a(2)0=g(Θ(1)00x0+Θ(1)01x1+Θ(1)02x2)=g(ΘT0x)=g(z(2)0)a0(2)=g(Θ00(1)x0+Θ01(1)x1+Θ02(1)x2)=g(Θ0Tx)=g(z0(

2))

a(2)1=g(Θ(1)10x0+Θ(1)11x1+Θ(1)12x2)=g(ΘT1x)=g(z(2)1)a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2)=g(Θ1Tx)=g(z1(

2))

a(2)2=g(Θ(1)20x0+Θ(1)21x1+Θ(1)22x2)=g(ΘT2x)=g(z(2)2)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2)=g(Θ2Tx)=g(z2(

2))

hΘ(x)=a(3)1=g(Θ(2)10a(2)0+Θ(2)11a(2)1+Θ(2)12a(2)2)hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2))

In the equations, the gg is sigmoid function that refers to the special case of the logisticfunction and

defined by the formula:

g(z)=11+e−zg(z)=11+e−z

Sigmoid functions

One of the reasons to use the sigmoid function (also called the logistic function) is it was the first one to

be used. Its derivative has a very good property. In a lot of weight update algorithms, we need to know a

derivative (sometimes even higher order derivatives). These can all be expressed as products

of ff and 1−f1−f. In fact, it's the only class of functions that satisfies f′(t)=f(t)(1−f(t))f′(t)=f(t)(1−f(t)).

However, usually the weights are much more important than the particular function chosen. These

sigmoid functions are very similar, and the output differences are small. Here's a plot from Wikipedia-

Sigmoid function. Note that all functions are normalized in such a way that their slope at the origin is 1.

Forward Propagation

x=⎡⎣⎢x0x1x2⎤⎦⎥z(2)=⎡⎣⎢⎢⎢z(2)0z(2)1z(2)2⎤⎦⎥⎥⎥x=[x0x1x2]z(2)=[z0(2)z1(2)z2(2)]

z(2)=Θ(1)x=Θ(1)a(1)z(2)=Θ(1)x=Θ(1)a(1)

a(2)=g(z(2))a(2)=g(z(2))

a(2)0=1.0a0(2)=1.0

z(3)=Θ(2)a(2)z(3)=Θ(2)a(2)

hΘ(x)=a(3)=g(z(3))hΘ(x)=a(3)=g(z(3))

The backpropagation learning algorithm can be divided into two phases: propagation and weight update.

- from wiki - Backpropagatio.

1. Phase 1: Propagation

Each propagation involves the following steps:

o Forward propagation of a training pattern's input through the neural network in order to

generate the propagation's output activations.

o Backward propagation of the propagation's output activations through the neural network

using the training pattern target in order to generate the deltas of all output and hidden

neurons.

2. Phase 2: Weight update

For each weight-synapse follow the following steps:

o Multiply its output delta and input activation to get the gradient of the weight.

o Subtract a ratio (percentage) of the gradient from the weight.

This ratio (percentage) influences the speed and quality of learning; it is called the learning rate.

The greater the ratio, the faster the neuron trains; the lower the ratio, the more accurate the

training is. The sign of the gradient of a weight indicates where the error is increasing, this is why

the weight must be updated in the opposite direction.

If we denote an error of node jj in layer ll as δ(l)jδj(l), for our output unit(L=3) becomes activation -actual

value:

δ(3)j=a(3)j−yj=hΘ(x)−yjδj(3)=aj(3)−yj=hΘ(x)−yj

δ(3)=a(3)−yδ(3)=a(3)−y

δ(2)=(Θ(2))Tδ(3)⋅g′(z(2))δ(2)=(Θ(2))Tδ(3)⋅g′(z(2))

where

g′(z(2))=a(2)⋅(1−a(2))g′(z(2))=a(2)⋅(1−a(2))

Note that we do not have δ(1)δ(1) term because that's the input layer and the values are the ones that we

observed and they are being used as a training set. So, there is no errors associate with the input.

∂∂Θ(l)ijJ(Θ)=a(l)jδ(l+1)i∂∂Θij(l)J(Θ)=aj(l)δi(l+1)

We use this value to update weights and we can multiply learning rate before we adjust the weight.

Code

importnumpy as np

def sigmoid(x):

return 1.0/(1.0 + np.exp(-x))

defsigmoid_prime(x):

return sigmoid(x)*(1.0-sigmoid(x))

deftanh(x):

returnnp.tanh(x)

deftanh_prime(x):

return 1.0 - x**2

classNeuralNetwork:

def __init__(self, layers, activation='tanh'):

if activation == 'sigmoid':

self.activation = sigmoid

self.activation_prime = sigmoid_prime

elif activation == 'tanh':

self.activation = tanh

self.activation_prime = tanh_prime

# Set weights

self.weights = []

# layers = [2,2,1]

# range of weight values (-1,1)

# input and hidden layers - random((2+1, 2+1)) : 3 x 3

fori in range(1, len(layers) - 1):

r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1

self.weights.append(r)

# output layer - random((2+1, 1)) : 3 x 1

r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1

self.weights.append(r)

# Add column of ones to X

# This is to add the bias unit to the input layer

ones = np.atleast_2d(np.ones(X.shape[0]))

X = np.concatenate((ones.T, X), axis=1)

for k in range(epochs):

i = np.random.randint(X.shape[0])

a = [X[i]]

for l in range(len(self.weights)):

dot_value = np.dot(a[l], self.weights[l])

activation = self.activation(dot_value)

a.append(activation)

# output layer

error = y[i] - a[-1]

deltas = [error * self.activation_prime(a[-1])]

# (a layer before the output layer)

for l in range(len(a) - 2, 0, -1):

deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))

# reverse

# [level3(output)->level2(hidden)] => [level2(hidden)-

>level3(output)]

deltas.reverse()

# backpropagation

# 1. Multiply its output delta and input activation

# to get the gradient of the weight.

# 2. Subtract a ratio (percentage) of the gradient from the

weight.

fori in range(len(self.weights)):

layer = np.atleast_2d(a[i])

delta = np.atleast_2d(deltas[i])

self.weights[i] += learning_rate * layer.T.dot(delta)

a = np.concatenate((np.ones(1).T, np.array(x)), axis=1)

for l in range(0, len(self.weights)):

a = self.activation(np.dot(a, self.weights[l]))

return a

if __name__ == '__main__':

nn = NeuralNetwork([2,2,1])

X = np.array([[0, 0],

[0, 1],

[1, 0],

[1, 1]])

y = np.array([0, 1, 1, 0])

nn.fit(X, y)

for e in X:

print(e,nn.predict(e))

Output:

epochs: 0

epochs: 10000

epochs: 20000

epochs: 30000

epochs: 40000

epochs: 50000

epochs: 60000

epochs: 70000

epochs: 80000

epochs: 90000

(array([0, 0]), array([ 9.14891326e-05]))

(array([0, 1]), array([ 0.99557796]))

(array([1, 0]), array([ 0.99707463]))

(array([1, 1]), array([ 0.00090973]))

k-Means Clustering

The k-Means Clustering finds centers of clusters and groups input samples around the clusters.

k-Means Clustering is a partitioning method which partitions data into k mutually exclusive clusters, and

returns the index of the cluster to which it has assigned each observation. Unlike hierarchical clustering,

k-means clustering operates on actual observations (rather than the larger set of dissimilarity measures),

and creates a single level of clusters. The distinctions mean that k-means clustering is often more

suitable than hierarchical clustering for large amounts of data.

The following description for the steps is from wiki - K-means_clustering.

Step 1

k initial "means" (in this case k=3) are randomly generated within the data domain.

Step 2

k clusters are created by associating every observation with the nearest mean. The partitions here

represent the Voronoi diagram generated by the means.

Step 3

The centroid of each of the k clusters becomes the new mean.

Step 4

Steps 2 and 3 are repeated until convergence has been reached.

nclusters(K) K : Number of clusters to split the set by.

criteria : The algorithm termination criteria, that is, the maximum number of iterations and/or the

desired accuracy. The accuracy is specified as criteria.epsilon. As soon as each of the cluster

centers moves by less than criteria.epsilon on some iteration, the algorithm stops.

attempts : Flag to specify the number of times the algorithm is executed using different initial

labellings. The algorithm returns the labels that yield the best compactness (see the last function

parameter).

flags :

o KMEANS_RANDOM_CENTERS - selects random initial centers in each attempt.

o KMEANS_PP_CENTERS - uses kmeans++ center initialization by Arthur and

Vassilvitskii.

o KMEANS_USE_INITIAL_LABELS - during the first (and possibly the only) attempt, use

the user-supplied labels instead of computing them from the initial centers. For the

second and further attempts, use the random or semi-random centers. Use one of

KMEANS_*_CENTERS flag to specify the exact method.

centers : Output matrix of the cluster centers, one row per each cluster center.

In this section, we'll play with a data set which has only one feature. The data is one-dimensional, and

clustering can be decided by one parameter.

We generated two groups of random numbers in two separated ranges, each group with 25 numbers as

shown in the picture below:

The code:

importnumpy as np

import cv2

frommatplotlib import pyplot as plt

y = np.random.randint(175,250,25) # 25 randoms in (175,250)

z = np.hstack((x,y)) # z.shape : (50,)

z = z.reshape((50,1)) # reshape to a column vector

z = np.float32(z)

plt.hist(z,256,[0,256])

plt.show()

Now, we're going to apply cv2.kmeans() function to the data with the following parameters:

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

# Set flags

flags = cv2.KMEANS_RANDOM_CENTERS

# Apply KMeans

compactness,labels,centers = cv2.kmeans(z,2,None,criteria,10,flags)

In the code, the criteria means whenever 10 iterations of algorithm is ran, or an accuracy of epsilon = 1.0

is reached, stop the algorithm and return the answer.

Here are the meanings of the return values from cv2.kmeans() function:

compactness : It is the sum of squared distance from each point to their corresponding centers.

labels : This is the label array where each element marked '0','1',.....

centers : This is array of centers of clusters.

importnumpy as np

import cv2

frommatplotlib import pyplot as plt

y = np.random.randint(175,250,25) # 25 randoms in (175,250)

z = np.hstack((x,y)) # z.shape : (50,)

z = z.reshape((50,1)) # reshape to a column vector

z = np.float32(z)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

# Set flags

flags = cv2.KMEANS_RANDOM_CENTERS

# Apply KMeans

compactness,labels,centers = cv2.kmeans(z,2,None,criteria,10,flags)

A = z[labels==0]

B = z[labels==1]

# initial plot

plt.subplot(211)

plt.hist(z,256,[0,256])

plt.subplot(212)

plt.hist(A,256,[0,256],color = 'r')

plt.hist(B,256,[0,256],color = 'b')

plt.hist(centers,32,[0,256],color = 'y')

plt.show()

We can see the data has been clustered: A's centroid is 72 and B's centroid is 217 which was one of the

outputs from the cv2.kmeans() function.

iPython

A browser-based notebook with support for code, text, mathematical expressions, inline plots and

other rich media.

Support for interactive data visualization and use of GUI toolkits.

Flexible, embeddable interpreters to load into ones own projects.

Easy to use, high performance tools for parallel computing.

Basic Signals - boxcar

We'll make a simple boxcar with np.zeros() and np.ones().

We start with a simple command to get python environment using ipython --pylab:

$ ipython --pylab

Python 2.7.5+ (default, Feb 27 2014, 19:37:08)

Type "copyright", "credits" or "license" for more information.

IPython 0.13.2 -- An enhanced Interactive Python.

? -> Introduction and overview of IPython's features.

%quickref -> Quick reference.

help -> Python's own help system.

object? -> Details about 'object', use 'object??' for extra details.

Welcome to pylab, a matplotlib-based Python environment [backend:

TkAgg].

For more information, type 'help(pylab)'.

In [1]:

Let's play with arrays first:

In [1]: zeros

Out[1]:

In [2]: zeros(100)

Out[2]:

array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [3]: ones(10)

Out[3]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Now, we want to concatenate these and make a boxcar:

Also, we we know more about the command, we can add '?' after the command. Then, it will

show details about the command as well as some usage samples:

In [4]: concatenate?

The following information will be displayed in another window (I mean if we can close the help

window (by typing 'q'), our command shell will reappear:

Type: builtin_function_or_method

String Form:<built-in function concatenate>

Docstring:

concatenate((a1, a2, ...), axis=0)

Join a sequence of arrays together.

Parameters

----------

a1, a2, ... : sequence of array_like

The arrays must have the same shape, except in the dimension

corresponding to `axis` (the first, by default).

axis : int, optional

The axis along which the arrays will be joined. Default is 0.

Now, concatenation:

In [5]: concatenate ( (zeros(100), ones(30), zeros(100)) )

Out[5]:

array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1.,

1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,

1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0.])

Actually, we want to assign the array to a variable:

In [6]: data = concatenate ( (zeros(100), ones(30), zeros(100)) )

In [7]: plot(data)

Out[7]: [<matplotlib.lines.Line2D at 0x3024450>]

The we get the picture below:

Note the values in the array is for y-values, for x-axis, the sample numbers are used: we have

100+30+100=230 samples.

Hamming filter

In [8]: hamming(40)

Out[8]:

array([ 0.08 , 0.08595688, 0.10367324, 0.13269023, 0.17225633,

0.2213468 , 0.27869022, 0.34280142, 0.41201997, 0.48455313,

0.55852233, 0.63201182, 0.70311825, 0.77 , 0.83092487,

0.88431494, 0.92878744, 0.96319054, 0.98663324, 0.99850836,

0.99850836, 0.98663324, 0.96319054, 0.92878744, 0.88431494,

0.83092487, 0.77 , 0.70311825, 0.63201182, 0.55852233,

0.48455313, 0.41201997, 0.34280142, 0.27869022, 0.2213468 ,

0.17225633, 0.13269023, 0.10367324, 0.08595688, 0.08

])

In [9]: plot(hamming(40))

Out[9]: []

We overwrote a new to the old one because we did not close the plot window before we made a

new draw.

So, let's close this window, and plot the Hamming again:

In [10]: plot(hamming(40))

Out[10]: []

However, if we want to apply the filter to the other signal, we need to normalize the filter.

We'll normalize the filter by diving it by the sum of the filter values:

In [11]: sum(hamming(40))

Out[11]: 21.139999999999997

In [12]: norm = hamming(40) / sum(hamming(40))

In [13]: plot(norm)

Out[13]: []

Note that the peak value is much lower than the previous Hamming filter.

Convolve

Now we do convolve the filter with the boxcar data:

In [14]: convolve(data, norm)

Out[14]:

array([ 0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0.0037843 , 0.00785037, 0.0127545 , 0.01903124, 0.0271796 ,

0.03765012, 0.05083319, 0.06704896, 0.08653903, 0.10946018,

0.13588035, 0.16577684, 0.19903693, 0.23546077, 0.27476658,

0.31659794, 0.36053301, 0.40609548, 0.45276687, 0.5 ,

0.54723313, 0.59390452, 0.63946699, 0.68340206, 0.72523342,

0.76453923, 0.80096307, 0.83422316, 0.86411965, 0.89053982,

0.90967668, 0.92510066, 0.93641231, 0.94331865, 0.94564081,

0.94331865, 0.93641231, 0.92510066, 0.90967668, 0.89053982,

0.86411965, 0.83422316, 0.80096307, 0.76453923, 0.72523342,

0.68340206, 0.63946699, 0.59390452, 0.54723313, 0.5 ,

0.45276687, 0.40609548, 0.36053301, 0.31659794, 0.27476658,

0.23546077, 0.19903693, 0.16577684, 0.13588035, 0.10946018,

0.08653903, 0.06704896, 0.05083319, 0.03765012, 0.0271796 ,

0.01903124, 0.0127545 , 0.00785037, 0.0037843 , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. , 0. ,

0. , 0. , 0. , 0. ])

In [15]:

We can also generate random signals.

In [16]: randn(20)

Out[16]:

array([ 1.04943265, -1.10655086, 2.65115344, 0.37216869, -1.81576411,

-0.72413097, -0.74942176, -0.38411864, 0.04484577, 0.31172929,

-0.78896659, -0.47319888, -0.61743916, -0.77453274, -2.34677579,

-0.61554551, 0.93594151, -1.06451913, -1.13374368,

1.16034994])

If we do FFT with 100 random numbers:

In [17]: random_data = randn(100)

In [18]: plot(fft.fft(random_data))

/usr/lib/python2.7/dist-packages/numpy/core/numeric.py:320:

ComplexWarning: Casting complex values to real discards the imaginary

part

return array(a, dtype, copy=False, order=order)

Out[18]: []

In [19]: plot(random_data)

Out[19]: []

In [20]:

Video Capture

The following code is to capture video which could be either live stream or file stream. To capture

a video, we need to create a VideoCapture object. Its argument can be either the device index or

the name of a video file. Device index is just the number to specify which camera. Normally one

camera will be connected (index = 0) or the 2nd camera (index = 1).

# capture.py

import numpy as np

import cv2

# Capture video from camera

# cap = cv2.VideoCapture(0)

# Capture video from file

cap = cv2.VideoCapture('BlueUmbrella.webm')

while(cap.isOpened()):

# Capture frame-by-frame

ret, frame = cap.read()

# Our operations on the frame come here

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Display the resulting frame

cv2.imshow('frame',gray)

if cv2.waitKey(1) & 0xFF == ord('q'):

break

# When everything done, release the capture

cap.release()

cv2.destroyAllWindows()

In this section, we want to track blue umbrella from the Pixar movie clip. How?

After converting BGR image to HSV, we extract a blue colored object. The reason for convderting to HSV

colorspace is because it is more easier to represent a color than RGB colorspace. Here are the steps to

take:

2. Convert from BGR to HSV color-space.

3. We threshold the HSV image for a range of blue color.

4. Now extract the blue object alone, we can do whatever on that image we want.

# cspace.py

import cv2

importnumpy as np

cap = cv2.VideoCapture('BlueUmbrella.webm')

while(1):

_, frame = cap.read()

hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

lower_blue = np.array([110,50,50])

upper_blue = np.array([130,255,255])

mask = cv2.inRange(hsv, lower_blue, upper_blue)

res = cv2.bitwise_and(frame,frame, mask= mask)

cv2.imshow('frame',frame)

cv2.imshow('mask',mask)

cv2.imshow('res',res)

if k == 27:

break

cv2.destroyAllWindows()

1. Original image

2. Threshold the HSV image to get only blue colors - mask

3. Bitwise-AND mask and original image - res

cspace_blue.mp4

cspace_blue.webm

cspace_blue.ogv

- An Approach to Develop Expert Systems in Medical Diagnosis Using Machine Learning Algorithms (Asthma) and A Performance StudyUploaded byijsc
- Time Series Forecasting With Feed-Forward Neural Networks:Uploaded byadnanmukred
- The Application of Data Mining to Build Classification Model for Predicting Graduate EmploymentUploaded byijcsis
- Artificial Neural Networks & IntelligenceUploaded bySrivalli Sri
- FUZZY NEURAL NETWORKS FOR IDENTIFICATION AND CONTROL OF DC DRIVE SYSTEMSUploaded byBablu Bommi
- ann_qbUploaded bydvnovov
- 7-f10086Uploaded bysandyyansiku
- Neural NetworksUploaded byAan Saputra
- AnnUploaded byRamprasad Srinivasan
- Development of Back Propagation Neural Network Model to Predict Performance and EmissionUploaded byIAEME Publication
- A Comparison of Two Learning Algorithms 64501Uploaded byHong Phuoc Doan Thi
- ExerciseUploaded byTuti Aryanti Hastomo
- decision treesUploaded byphli
- 04012019Uploaded byHamilton Angueta
- Forecasting of Suspended Sediment in Rivers Using Artificial Neural Networks ApproachUploaded byIJAERS JOURNAL
- ClassificationUploaded byJijeesh Baburajan
- 05968251Uploaded bySeetha Chinna
- Face Regconition SampleUploaded byAzri Mohd Khanil
- 1Uploaded bymihutz_86
- Open23Uploaded byamcuc
- Thescipub.com PDF Jcssp.2010.1457.1464Uploaded byGinger Simmons
- A Simplified Design of Multiplier for Multi Layer Feed Forward Hardware Neural NetworksUploaded byInternational Journal of Research in Engineering and Technology
- Application of Artificial Neural Network to Predict Squall Thunderstorms Using RAWIND DataUploaded bybrigita
- Review on Design and Implementation of Audio Signal Classification System to classify the Media in Speech/MusicUploaded byIRJET Journal
- Rr410212 Neural Networks and ApplicationsUploaded bySrinivasa Rao G
- 60-Trilok-WEKA Approach for ComparativeUploaded bygizemyaren
- Efficeint_SQL_DataminingUploaded byChristopher L Rodrigues
- Learning Analytics r Price 29 Oct 2014Uploaded bySutharthanMariyappan
- 1-s2.0-S0308521X13001273-main.pdfUploaded byHadiBies
- Relevance and Ranking in Online Dating SystemsUploaded byTionaTiona

- Probability CheatsheetUploaded byClases Particulares Online Matematicas Fisica Quimica
- PIG Commands (Weather Data)Uploaded bylinkranjit
- V4I5_IJERTV4IS050643.pdfUploaded byvikasdstar1669
- 27.pdfUploaded bylinkranjit
- Regular ExpressionsUploaded byCarlos Brando
- CLI - Hbase PracticalUploaded bylinkranjit
- Journey of Machine LearningUploaded bylinkranjit
- Breast Cancer.namesUploaded bylinkranjit
- breast-cancer-data.txtUploaded bylinkranjit
- big-o-cheatsheet.pdfUploaded byjil
- bayes08.pdfUploaded bylinkranjit
- Reference ParsingUploaded bylinkranjit
- CROSSWD.TXTUploaded byMabu Dba
- A Tale of Two Citys - Charles DickensUploaded bypaoloplot
- Cse IV Unix and Shell Programming 10cs44 NotesUploaded bylinkranjit
- Sqoop Big Data TechUploaded bylinkranjit
- readme59.txtUploaded bylinkranjit
- apachesqoopwithusecases-140724051203-phpapp01Uploaded bylinkranjit
- 02_Hadoop Architecture Transcript 1Uploaded byMrinmoy Laha
- SQL Cheat SheetUploaded bymitch4221
- Cluster SizingUploaded bylinkranjit
- Pages From Cloudera_DS-200Uploaded bylinkranjit
- Pages From Cloudera_CCD-470Uploaded bylinkranjit
- Pages From Cloudera_CCD-410Uploaded bylinkranjit
- Pages From Cloudera_CCD-333Uploaded bylinkranjit
- Pages From Cloudera_CCB-400Uploaded bylinkranjit
- Lab4-NaiveBayesUploaded byhitesh_dhingra1987
- String rUploaded bylinkranjit
- graduate_01-09-14Uploaded byDeepikaBhardwaj

- Image processing Lecture 1Uploaded bykamar044
- WideFS TechnicalUploaded bybagarie2
- Workbench EnUploaded byLeonel Mar Flores
- readme.txtUploaded byNecunoscut Nuconteaza
- Production Db Block CorruptionUploaded byvinay234
- DOC-34155Uploaded byjustforregister
- Icas Dt2015 Paper g Ec2Uploaded byekache99
- Declaration of Data Types and ConstantsUploaded byAlyDeden
- Epicor905 Install Guide SQLUploaded byhereje73
- BSD_10_2011Uploaded bycristy84an
- 1- Training PDS Interface- Generate PCD and PDS RDB FilesUploaded byno1139
- W_Chapter_2Uploaded byKurumeti Naga Surya Lakshmana Kumar
- Lithunwrap TutorialUploaded byom
- Design Patterns Abap ObjectsUploaded bycesanne1
- Florin Leon - Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine LearningUploaded byEnrollInfo
- CastNET 10 Documentation2Uploaded byPablo Navarro Cortes
- [2016-Jun-NEW]Oracle 1Z0-052 Exam Dumps 261q OfferUploaded byRobert Diaz
- Iris ScriptUploaded byKathryn Bates
- Ch11Uploaded byMemo Yassin
- With Rapid CloneUploaded bycglb007743
- Salesforce Pages Developers GuideUploaded byMalladi Kowshik
- End Sem ReportUploaded byAnkit Gupta
- Although There Are Differences Between Structured Programming and Object Oriented ProgrammingUploaded byIndranil Sikder
- 11-client-listener-169135Uploaded bypragiya
- Scratch Reference Guide 14Uploaded byMaru Busico-Flight
- flash-memory-an-overview1323.pdfUploaded byRamanand Shankarling
- Capture IntrudersUploaded byMorpheu
- convergysUploaded bySanjith Nimshad
- Accurate Distributed Cluster Analysis for Big Data Competitive K-MeansUploaded byshubham97
- Trinity Rescue Kit _ CPR for Your Computer _ TrinityhomeUploaded bymsgoulart