Designing a Learning System to Play Checkers

Designing a Learning System
Dr.Chandrika.J
Professor
CSE
Course Faculty
Effects of Programs that Learn
 Application areas
 Learning from medical records which treatments are most
effective for new diseases
 Houses learning from experience to optimize energy costs
based on usage patterns of their occupants
 Personal software assistants learning the evolving interests of
users in order to highlight especially relevant stories from the
online morning newspaper
2
Effective Applications of Learning
 A successful understanding of how to make computers learn
would open up many new uses of computers and new levels
of competence and customization
 Speech recognition
 outperform all other approaches that have been attempted to date
 Pattern detection
 Learning algorithms being used to discover valuable knowledge from
large commercial databases
 detect fraudulent use of credit cards
 Play Games
 Play backgammon at levels approaching the performance of human
world champions
3
Well Posed Learning Problems
 A computer program is said to learn from experiences E with
respect to some class of tasks T and performance P, if its
performance at tasks in T, as measured by P, improves with
experience E
 To have a well-defined learning problem, three features
needs to be identified:
 The class of tasks
 The measure of performance to be improved
 The source of experience
 Examples
 A checkers learning problem:
 Task T: playing checkers
 Performance measure P: percent of games won against opponents
 Training experience E: playing practice games against itself
 Handwriting recognition learning problem:
 Task T: recognizing and classifying handwritten words within images
 Performance measure P: percent of words correctly classified
 Training experience E: a database of handwritten words with given classifications
4
Designing a Learning System
 Consider designing a program to learn to play checkers,
with the goal of entering it in the world checkers
tournament
 Requires the following steps
 Choosing Training Experience
 Choosing the Target Function
 Choosing the Representation of the Target Function
 Choosing a Function Approximation Algorithm
 Estimating training values
 Adjusting the weights
 The Final Design
5
Choosing the Training Experience (1)
 The first design choice is to choose the type of training experience from
which the system will learn.
 The type of training experience available can have a significant impact on
success or failure of the learner.
 There are three attributes which impact on success or failure of the
learner
1. Will the training experience provide direct or indirect feedback?

 Direct Feedback: system learns from examples of individual checkers board
states and the correct move for each
 Indirect Feedback: Move sequences and final outcomes of various games played
 Credit assignment problem: Value of early states must be inferred from the
outcome
6
2. Degree to which the learner controls the sequence of
training examples
 Teacher selects informative boards and gives correct

move
 Learner proposes board states that it finds particularly
confusing. Teacher provides correct moves
 The learner may have complete control over both the
board states and (indirect) training classifications, as it
does when it learns by playing against itself with no
teacher present.
7
3. How well the training experience represents the distribution
of examples over which the final system performance P will be
measured
 If training the checkers program consists only of experiences played
against itself, it may never encounter crucial board states that are
likely to be played by the human checkers champion
Note - there is a danger that this training experience might not be
fully representative of the distribution of situations over which it will
later be tested.
 Most theory of machine learning rests on the assumption that the
distribution of training examples is identical to the distribution of test
examples
8
Partial Design of Checkers Learning Program
 Performance measure P: percent of games won in the world
tournament
 Training experience E: games played against itself
 Remaining choices
 Choosing the Target Function
 Choosing the Representation of the Target Function
 Choosing a Function Approximation Algorithm
 Estimating training values
 Adjusting the weights
 The Final Design
9
Choosing the Target Function (1)
 The next design choice is to determine exactly what type
of knowledge will be learned and how this will be used
by the performance program.
 For Checkers game, the most obvious choice for the type
of information to be learned is a program, or function,
that chooses the best move for any given board state.
 Program needs to learn the best move from among legal

moves
 Defines large search space known a priori
 target function: ChooseMove : B → M
 ChooseMove is difficult to learn given indirect training
10
 Alternative target function
 An evaluation function that assigns a numerical score to any
given board state
 V : B → R ( where R is the set of real numbers)
 V(b) for an arbitrary board state b in B
 if b is a final board state that is won, then V(b) = 100
 if b is a final board state that is lost, then V(b) = -100
 if b is a final board state that is drawn, then V(b) = 0
 if b is not a final state, then V(b) = V(b '), where b' is the
best final board state that can be achieved starting from b
and playing optimally until the end of the game
11
 V(b) gives a recursive definition for board state b
 Not usable because not efficient to compute except in first
three trivial cases
 nonoperational definition
 Goal of learning is to discover an operational description
of V
 Learning the target function is often called function
approximation
 Referred to as Vˆ
12
Choosing a Representation for the Target Function
 Choice of representations involve trade offs
 Pick a very expressive representation to allow close approximation to the
ideal target function V
 More expressive, more training data required to choose among alternative
hypotheses
 Use linear combination of the following board features:
 x1: the number of black pieces on the board
 x2: the number of red pieces on the board
 x3: the number of black kings on the board
 x4: the number of red kings on the board
 x5: the number of black pieces threatened by red (i.e. which can be captured
on red's next turn)
 x6: the number of red pieces threatened by black
13 Vˆ (b)  w0  w1 x1  w2 x2  w3 x3  w4 x4  w5 x5  w6 x6
Partial Design of Checkers Learning Program
 Performance measure P: percent of games won in the world
tournament
 Training experience E: games played against itself
 Target Function: V: Board → 

 Target function representation
Vˆ (b)  w0  w1 x1  w2 x2  w3 x3  w4 x4  w5 x5  w6 x6
14
Choosing a Function Approximation Algorithm
 Derive training examples from the indirect training experience
available to the learner
 Adjusts the weights wi to best fit these training examples
 To learn we require a set of training examples describing the board b
and the training value Vtrain(b)
 Each training example is an Ordered pair b , V train b 
x1  3, x2  0, x3  1, x4  0, x5  0, x6  0 ,100
For example the above example describes a board state b in which

black has won the game (note x2 = 0 indicates that red has no
remaining pieces) and for which the target function value Vtrain(b) is
therefore +100.
15
Estimating Training Values
 Need to assign specific scores to intermediate board states
 Approximate intermediate board state b using the learner's
current approximation of the next board state following b
 Simple and successful approach

 More accurate for states closer to end states
Vtrain (b)  Vˆ ( Successor (b))
16
Adjusting the Weights
 Choose the weights wi to best fit the set of training examples
 Minimize the squared error E between the train values and the
values predicted by the hypothesis
 V b   V b 2
E  ˆ
train
b ,V train b   training examples
 Require an algorithm that

 will incrementally refine weights as new training examples become available
 will be robust to errors in these estimated training values
 Least Mean Squares (LMS) is one such algorithm
17
LMS Weight Update Rule
LMS weight update rule :-
For each training example (b, Vtrain(b))
Use the current weights to calculate V̂ (b)
For each weight wi, update it as
wi ← wi + ƞ (Vtrain (b) -V̂ (b)) xi
Here ƞ is a small constant (e.g., 0.1) that moderates the size of the
weight update.
Working of weight update rule
• When the error (Vtrain(b)- V̂ (b)) is zero, no weights are changed.
• When (Vtrain(b) - V̂ (b)) is positive (i.e., when V̂ (b) is too low),
then each w is increased in proportion to the value of its
corresponding feature. This will raise the value of V̂ (b), reducing
the error.
• If the value of some feature xi is zero, then its weight is not altered
regardless of the error, so that the only weights updated are those
18 whose features actually occur on the training example board.
Final Design
Experiment
New problem Hypothesis
(initial game board)
Generator
Vˆ 
Performance Generalizer
System
Solution trace Training examples

(game history)  b ,V b  , b ,V b , 
1 train 1 2 train 2
Critic
19
The Performance System is the module that must solve the given
performance task by using the learned target function(s). It takes an instance
of a new problem (new game) as input and produces a trace of its solution
(game history) as output.
The Critic takes as input the history or trace of the game and produces as
output a set of training examples of the target function
The Generalizer takes as input the training examples and produces an

output hypothesis that is its estimate of the target function. It generalizes
from the specific training examples, hypothesizing a general function that
covers these examples and other cases beyond the training examples.
The Experiment Generator takes as input the current hypothesis and

outputs a new problem (i.e., initial board state) for the Performance System
to explore. Its role is to pick new practice problems that will maximize the
learning rate of the overall system.
20 CS 484 – Artificial Intelligence
Summary of Design Choices
Determine Type
of Training Experience
Games against
Experts
Games against itself Table of
correct Moves …
Determine
Target Function
…
Board → move Board → value
Determine Representation
Polynomial
Artificial neural
of Learned Function
network
Linear function of six features …
Determine
Learning Algorithm
Gradient descent Linear Programming …
21
Complete Design
Training Classification Problems
 Many learning problems involve classifying inputs into a
discrete set of possible categories.
 Learning is only possible if there is a relationship
between the data and the classifications.
 Training involves providing the system with data which
has been manually classified.
 Learning systems use the training data to learn to classify
unseen data.
22

Designing a Learning System to Play Checkers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Designing a Learning System to Play Checkers

Uploaded by

Copyright:

Available Formats

Designing a Learning System

1. Will the training experience provide direct or indirect feedback?

states and the correct move for each

 Teacher selects informative boards and gives correct

 Program needs to learn the best move from among legal

 x1: the number of black pieces on the board

 x2: the number of red pieces on the board

 x3: the number of black kings on the board

 x4: the number of red kings on the board

on red's next turn)

 x6: the number of red pieces threatened by black

 A checkers learning problem:

 Task T: playing checkers

 Performance measure P: percent of games won in the world

 Target Function: V: Board → 

For example the above example describes a board state b in which

 Simple and successful approach

Vtrain (b)  Vˆ ( Successor (b))

 Require an algorithm that

Solution trace Training examples

The Generalizer takes as input the training examples and produces an

The Experiment Generator takes as input the current hypothesis and

You might also like