You are on page 1of 29

CHAPTER 6 Machine Learning

Aims
To introduce
 the basics of machine learning, particularly inductive learning.

Objective
You should be able to
 Describe how each method can be used to perform classification tasks.
 Use a tool to perform a simple application using the above methods.
Introduction
 The ability to learn is one of the most crucial/important
characteristics of an intelligent entity.
 A system that can learn is more flexible, being able to
respond to new problems and situations, and may also be
easier to program.
 Learning is still an expanding area of AI research.
 It overlaps with almost all other areas of AI such as:
 in planning and robotics, there is interest in getting systems to learn rules of
behavior from experience in some environment;
 in natural language a system may learn syntactic rules from example
sentences; in vision a system may learn to recognize some object given some
example images; and
 in expert systems rules may be learned from example cases.
It is also an area which is attracting interest in industry, with
many commercial products available. For example, there is
interest in analyzing data obtained from supermarket loyalty
cards in order to find rules that can be used in direct
marketing campaigns.

There are several different basic kinds of learning, involving:


 learner and teacher. A teacher may tell you something directly, so
you just have to remember it; they may give some examples; present
an analogy. Known as supervise learning.
 to discover new knowledge through experimentation/ experience –
unsupervised learning.
In AI, most of the work to date has been on
learning from examples, or inductive learning.

This may involve learning conceptual categories


(like he concept of “dog”, from examples of dogs),
learning rules to predict the weather, learning rules
to diagnose a disease, and so on.

In each case, examples are given in some


suitable formalism, and the system attempts to
infer general rules/ formula or descriptions from
those examples.
 In general, inductive learning is used to train a system
to perform classification tasks. A classification task
means there are a number of input features, and a set
of possible output categories.

 For example, medical diagnosis is a classification task,


where the input features are the patient’s symptoms,
and the output categories are the possible diagnoses.

 The inductive learning methods may be used to try to


produce a system to automatically produce the correct
classification given just the input feature values.
 The techniques for inductive learning
 symbolic methods - learning is seen as a search problem - the
search space of possible concepts to be searched to find one
that matches the examples. The approach involves building up
the best decision tree to categorize the given examples.

 genetic algorithms are based on the notion that good solution


can evolve out of a population, by combining possible solutions
to produce “offsping” solutions and “killing off” the weaker of
those solutions.

 neural networks - loosely based on the architecture of the brain,


and are a promising approach for certain tasks.
A Simple Inductive Learning Example
 Real machine learning applications typically require
many hundreds or even thousands of examples
(dataset) in order for interesting knowledge to be
learned.
 For example, to learn rules to diagnose a particular
disease, given that the patient has, say, stomach pains,
data on thousands of patients would be required, listing
the additional symptoms of each patient and the final
diagnoses made by an expert.
 To illustrate the methods a simpler problem and set of
examples are required, e.g. the “student” problem.
 Suppose we have data on a number of students in last
year’s class, and are trying to find a rule that will allow
us to determine whether current students are likely to
get a first-class degree mark.
 The ones that did are referred to as positive examples
while the ones who didn’t referred to as negative
examples (Fig. 1).
Student First last Male? Works Drinks? First this
Year? hards? Year?

Richard yes yes no yes no

Alan yes yes yes no yes

Alison no no yes no no

Jeff no yes no yes no

Gail yes no yes yes yes

Simon no yes yes yes no

Fig. 1 Student exam performance data


 A quick inspection should shows that the two people
who got firsts (Alan and Gail) both got firsts last year and
work hard, and that none of the people who failed to get
firsts both did well last year and work hard.
 So a reasonable learned rule - if you did well last year
and work hard this year you should do OK.
 However, other rules are possible – example - if you
EITHER are male and don’t drink OR are female and drink
a lot then you’ll do well. But this rule is a little odd, and
more complex.
 Generally the best rule, getting most predictions right,
will be the simplest one, as it tends to capture
generalities (hard-working students do well).
 In the example, four attributes (or features) to focus
are first last year, works hard, male/female and
drinks. All these have yes/no answers – known as
feature values.
 We use the letters L, M, W and D to represent
features, and the feature values as Ts and Fs.
 So, Richard’s feature values correspond to the row
TTFT.
 The fact “doesn’t drink but does work hard can also
be represented as W  D.
Version Space Learning
 This method treats learning as a search problem.
 The rule to be learned involves a conjunction of facts i.e.,
a rule only involving AND. Eg. “If they work hard and
don’t drink a lot they’ll get a first” can be represented as
W  D.
 The rule “Everyone will get a first” is T (always true) and
“No-one will get a first” is F.
Decision Tree

 This method can contains rule with disjunctions.

 It is based on representing the rule as a decision tree.

 Example: Figure 7.3- simplified decision tree to determine someone


coming to the surgery with chest pains has had a heart attack. The
“diagnosis” is made by going through the tree, answering the yes/no
questions posed by the system.
 
 Decision tree induction systems try to construct the simplest decision
tree that correctly classifies all the example data from past cases. The
idea is that if the tree is simple it will capture generalities in the example
data and be useful for making predictions or diagnoses given new cases.
 To illustrate the algorithm - the student data

 Algorithm DT:
1. pick the best attribute i.e, First last year
2. produce branch according to its value
3. find their records
4. find their categories/decision
5. if 100% correct then stop else goto 1
 
 Try FLY, WH? And WH,D?

 The general idea is to look for features which are particularly good
indicators of the result you’re interested in. These features are then
placed (as questions) in nodes of the tree.
Student First last Male? Works Drinks? First this
Year? hards? Year?

Richard yes yes no yes yes

Alan yes yes yes no yes

Alison no no yes no yes

Jeff no yes no yes no

Gail yes no yes yes yes

Simon no yes no yes no


Genetic Algorithm
A very different sort of method.
A GA can be viewed as a kind of search technique.
Successfully applied to timetabling problems, which involve searching for a
possible assignment of events (e.g. lectures) to rooms and times, given
various constraints (e.g., people can’t be in two places at the same time.
There may be many millions of possible rules, and it may be hard to find the
best such rules.
Are biologically inspired, being influenced by theories of evolution.
Also sometimes called evolutionary algorithms
The basic idea is to have a population of genomes representing possible
solutions to mutate and combine these to produce new ones (offspring), and
to evaluate the performance of these offspring using some scoring function.
The fittest of these offspring (with highest score) survive to “mate” again.
Neural Networks
It provide a rather different approach to reasoning and learning.
Consist many simple processing units (or neurons) connected together.
The behaviour of each neuron is very simple, but together a collection of
neurons can be sophisticated behaviour and be used for complex tasks.
Example vision, speech, walking
There are many kinds of neural networks, so this discussion will be limited
to perceptrons, including multiplayer perceptrons.
The behaviour depends on weights on the connections between neurons.
These weights are updated during iteration (learning) take place in order to
allow a given example data approaching its target output.
It is not a symbol structure (likes rule or decision tree) that can easily be
interpreted but it is treated as a “black box” which, given some inputs,
returns some outputs.
NNs are biologically inspired - from neurons in the human brain.
Biological Neurons
The human brain consists of approximately ten thousand million simple
processing units/neurons.
Each neuron is connected to many thousand other neurons.
The basic idea is that a neuron receives inputs from its neighbours, and if
enough inputs are received at the same time that neuron will be excited or
activated and fire, giving an output that will be received by further neurons.
Figure 7.5 illustrates the basic features of a neuron.
 Soma is the body of the neuron.
 Dendrites are filaments that provide inputs to the cell.
 The axon sends output signals, and
 A synapse (or synaptic junction) is a special connection which can be
strengthened or weakened to allow more or less of a signal through.
Depending on the signals received from all its inputs, a neuron can be in either
an excited or inhibited state. If excited, it will pass on that “excitation”
through its axon, and may in turn excite neighbouring cells.
 The behaviour of a network depends on the strengths of the connections
between neurons.
 In the biological neuron this is determined at the synapse.
 The synapse works by releasing special chemicals called neurotransmitters
when it gets an input. More or less of such chemicals may be released,
and this quantity may be adjusted over time.
 This can be thought of as a simple learning process.
The Simple Perceptron: Simple learning
 

•It just takes a number of inputs (corresponding to the signals from neighbouring cells),
adjusts these using a weight to represent the strength of connections at the synapses,
sums these, and fires if this sum exceeds some threshold.
•A neuron which fire will have an output value of 1, and other wise output 0.
•More precisely, if there are n inputs ( and n associated weights) the neuron finds the
weighted sum of the inputs and outputs 1 if this exceeds a threshold t and 0 otherwise.
•If the inputs are x1…..xn, with weights w1 ..wn:
 
if (w1 x1 +…+ wn xn)> t, i.e. 0.5
then output = 1
else output = 0
 
•This basic neuron is referred to as a simple perceptron, and is illustrated in the
following figure. The name “perceptron” was proposed by Frank Rosenblatt in 1962. He
pioneered the simulation of neural networks on computers.
• A serious neural network application would require a network of
hundreds or thousands of neurons.  
• Learning in neural networks, involves using example data to adjust the
weights in a network.  
• Each example will have specified input-output values.  
• These examples are considered one by one, and weights adjusted by a
small amount if the current network gives the incorrect output.
• The way this is done is to increase the weights on active connections if the
actual output of the network is 0 but the target output (from the example
data) is 1, and decrease the weights if the actual output is 1 and the target
is 0.
• The whole set of examples has to be considered again and again until
eventually (we hope) the network converges to give the right results for
all the given examples. 
• Example: Student problem.
No. Student First last Male? Works Drinks? First this
year? hard? year? (target value)

1 Richard yes yes no yes Yes

2 Alan yes yes yes no Yes

3 Alison no no yes no yes

4 Jeff no yes no yes no

5 Gail yes no yes yes yes

6 Simon no yes yes yes no


• Each feature (male, works hard etc) can be represented by an input, so x = 1
if the student in question got a first last year, x =1 if they are male, and so
on.
• The output corresponds to whether they end up getting a first, so
output = 1.
• Initially the weights are set to some small random values, i.e., the value 0.2.
• The threshold is set to 0.5.
• The amount that the weights are adjusted for this example will have the
value d = 0.05.
• The following figure illustrates the example data from the first student example
(Richard).
• Before any learning has taken place the output of this network is 1, as the weighted
sum of the inputs is 0.2 + 0.2 + 0.2 = 0.6, which is higher than the threshold of 0.5.
• and Richard did get a first (target value).
• Therefore there is no change of weigths
• The next example (Alan) is now considered.
• His inputs are 1, 1, 1 and 0.
• The current network gives an output of 1 (the weighted sum is exactly 0.5), but the
correct output is 1, so there are no change of weights.
• All the other examples are considered in the same way.
• Learning doesn’t end there.
• All the examples must be considered again and again until the network gives the
right result for as may examples as possible (the error is minimized).
• After a second run-through with our example data the new weights are 0.25, 0.1,
0.2 and 0.1.
• These weights in fact work perfectly for all the examples, so after the third run-
through the process halts. Weights have now been learned such that the
perceptron gives the correct output for each of the examples.
• If a new student is encountered then to predict their results we use the learned
weights. May be Tim got a first last year, works hard, is male, but drinks. We would
predict that he will get a first.
• The basic algorithm:

Randomly initialize the weights.


Repeat
For each record
Calculate sum of W X and determine its output
If (sum)>=threshold, i.e., 0.5 then output = 1 else 0
If the calculated output is 1 and the target output is 0, decrement the weights on
active connections by d, i.e. 0.05;
If the calculated output is 0 and the target output is 1, increment the weights on
active connections by d, i.e. 0.05;
Until the network gives the correct outputs (or some time limit is exceeded).
• Example: Student performance:

1. X = {1,1,0,1},{1,1,1,0},{0,0,1,0},{0,1,0,1},{1,0,1,1},{0,1,1,1}
2. Target = {1,1,1,0,1,0}
3. W1=W2=W3=W4=0.2
4. REPEAT // start training
sum = W*X
IF (sum > 0.5) // threshold
Output = 1 ELSE Output = 0
IF (Output = 0 AND Target = 1)
W = W + 0.05; // increment weights on X != 0
IF (Output = 1 AND Target = 0)
W = W – 0.05; // decrement weights on X != 0
UNTIL ERROR = 0; // all correctly classified
 
• Test the algorithm for correctly classified with the final new weights

You might also like