You are on page 1of 42

COM 423 NOTE: ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEM

ARTIFICIAL INTELLIGENCE (AI):

A key task in problem solving especially with computers at different stages of execution is
decision making.

Decision making requires processing of symbolic information in handling of facts and inference
using domain knowledge. Inference is nothing but search through the knowledge base using the
facts. The intensive research carried out in the area of AI in the last six decades resulted in the
emergence of a number of useful techniques which can be used for solving many complex
problems.

Intelligence is synonymous with human (animal) ability to store and recall fact (cognitive), solve
a given problem based on known fact and relevant theorem (psychomotor). This ability is
inherent and innate, trainable and can be developed. Artificial Intelligence (AI) is the ability of
an electronic device (computer) to accomplish any tasks that ordinary would have been handled
by human.

Artificial intelligence is an intelligence exhibited by an artificial entity, generally assumed to be


a computer. Also refers to as synthetic intelligence. AI is now concerned with producing
machines to automate tasks featuring intelligence behavior, such as control, planning,
scheduling, ability to answer diagnostic and consumer questions, handwriting, voice and facial
recognition. AI techniques are now mostly use in economics, medicine, engineering and military
and commonly found in various home as computer software applications, traditional strategies
games such as chess and other radio games. So, it has become scientific discipline focusing on
providing solution to real life problem.

AI models the richness and dynamisms of human brain and its analytic and memory capability.

Artificial Intelligence focus on

(i) the use of computers to process symbols,

(ii) the need for new languages, and

1
(iii) the role of computers for theorem proving instead of focusing on hardware that
simulated intelligence.

Major Categories of AI

1. Symbolic: Based on logic and uses of sequences of rules. Symbolic programs are good in
modeling how human think, act and accomplish tasks.

2. Connectionist: Based on network of neurons in the brain. Brittle and good for machine
learning and pattern recognition.

3. Evolutionary: Based on genetics evolution theory in biology.

Purpose of AI

1. Technological

2. Psychological

3. Economic

CONVENTIONAL AND COMPUTATIONAL INTELLIGENCE

Two school of thought of AI: conventional and computational AI

Conventional AI involves methods classified as machine learning characterized by formalism


and statistical analysis, the conventional AI is also known as symbolic AI, logical AI, Neat AI
and Good Old-Fashioned AI (GOFAI).

The methods include

(1) Expert system – apply reasoning capability to reach a conclusion. An ES can process
large amount of information and provide conclusion based on them
(2) Case based Reasoning
(3) Bayesian Network
(4) Behavior based AI—A modular method of building AI system by hand

COMPUTATIONAL AI

2
It involves iterative development or learning, Learning is based on empirical data associated to
non-systematic AI, scruffy AI and soft computing

Methods includes
(1) Neural Network: - it is a system with very strong capabilities
(2) Genetic Algorithm of pattern recognition
(3) Fuzzy system: -techniques for reasoning under uncertainty
(4) Evolutionary computation:- use biologically inspire concept such as population,
mutation, superficial of the fitness to generate increasingly better solution to our
problem.

LOGIC: Proposition Logic and Predicate Logic

What is logic? Logic is a truth-preserving system of inference.

Truth-preserving: If the initial statements are true, the inferred statements will be true

System: a set of mechanistic transformations, based on syntax alone

Inference: the process of deriving (inferring) new statements from old statements

Proposition Logic

A proposition is a statement of language that formulates something about an external world. A


propositional logic is a logical statement whose truth value can be evaluated as either TRUE or
FALSE.

Where T denotes TRUE and F denotes FALSE

Examples:

This class is COM 423 (true)

Today is Sunday (false) n It is currently raining in Singapore (???)

Every proposi0on is true or false, but its truth value (true or false) may be unknown

Propositional logic is of two types, namely

• Simple Propositional Logic

3
• Complex or Compound Propositional Logic.

• Simple Propositional Logic

• A simple propositional logic refers to single logical statement whose truth value can be
verified or evaluated.

• e.g. GAPOSA is a Polytechnic.

• Complex or Compound Propositional Logic

• A complex propositional logic refers to logical statements which are combinations of two
or more simple propositional logic statements with the use of connectors (connectives)
such as disjunction, conjunction etc.

• e.g. GAPOSA is a Polytechnic and it is located in Saapade.

A propositional statement is one of:

A simple proposition is denoted by a capital letter, e.g. ‘A’.

A negation of a propositional statement e.g. ¬A : “not A”

Two propositional statements joined by a connective n

e.g. A ∧ B: “A and B”

e.g. A ∨ B: “A or B”

If a connective joins complex statements, parenthesis are added

e.g. A ∧ (B∨C) Truth Table

Truth Tables

4
Truth tables are used for stating precise logic values for logic statements. The number of
rows in a truth table is 2n, where n is the number of simple propositions in the logical
statements.

Each of the propositions are given labels such as A, B, C etc.

Logic Connectives

Logic uses names and symbols to represent the connectives as illustrated in this table.

Symbol Connectives Logic Name

., Conjunction AND

+, Disjunction OR

 Negation NOT

 Implication Implies

 Equivalence Double
implication

 Exclusive OR EX-OR

AND Truth Table

e.g. GAPOSA is a Polytechnic and it is located in Saapade.

5
comprises GAPOSA is a Polytechnic A

GAPOSA is located in Saapade B

Connective: AND, ., 

Rules of AND

1. The output is TRUE when all the inputs are TRUE.

2. The output is FALSE if any or all the inputs are FALSE

A B A AND B A B A.B

T T T T T

T F F F F

F T F F F

F F F F F

OR Truth Table

e.g. Either GAPOSA is a university or is a polytechnic.

comprises GAPOSA is a university A

GAPOSA is a polytechnic B

Connective: OR, +, 

Rules of OR

1. The output is TRUE when at least one or all the inputs are TRUE.

2. The output is FALSE if all the inputs are FALSE

6
A B A OR B A B A+B

T T T T T

T F T T T

F T T T T

F F F F F

NOT Truth Table

e.g. GAPOSA is a polytechnic A

GAPOSA is not a polytechnic A

Connective: NOT, -, 

Rules of NOT

The output is a negation of the input. That is, when the

• Input is TRUE, the output is FALSE.

• Input is FALSE, the output is TRUE.

A NOT A A - A

T F F F

F T T T

IMPLICATION Truth Table

e.g. IF GAPOSA is a polytechnic THEN it is a higher institution.

7
comprises GAPOSA is a university A

GAPOSA is a higher institution  B

Connective: Implies, -, 
Rules of Implication

1. The output is TRUE when at least one or all the inputs are TRUE.

2. The output is FALSE if all the inputs are FALSE

A B A Implies B
A  B

T T T T

T F F F

F T T T

F F T T

EQUIVALENCE Truth Table


e.g. A university degree is equivalent to a polytechnic diploma
Connective: Equivalence, ↔
Rules of Equivalence
1. The output is TRUE when all the inputs are TRUE.
2. The output is TRUE when all the inputs are FALSE.
3. The output is FALSE if any of the input is FALSE

A B A↔B

T T T

8
T F F

F T F

F F T

Exercise: Using a truth table, prove that (A ↔B) ↔ ((A ↔B) ^ (B→A))

EXCLUSIVE-OR Truth Table


e.g. A university degree is equivalent to a polytechnic diploma
Connective: EX-OR,

Rules of Equivalence
1. The output is TRUE when either at least an input is TRUE or FALSE.
2. The output is FALSE if either all the inputs are TRUE or all the inputs are FALSE

A B A B

T T F

T F T

F T T

F F F

TAUTOLOGY AND CONTRADICTION


TAUTOLOGY
An expression with a truth value T irrespective of the truth values of the constituent atoms.

A B A ^ B→A

T T T

T F T

9
F T T

F F T

CONTRADICTION
An expression with a truth value F irrespective of the truth values of the constituent atoms.

A B A ^ B→A

T T F

T F F

F T F

F F F

ARGUMENT AND VALIDITY


Argument: An argument presents a conclusion as following logically from a set of assumptions.
e.g. If we say “John’s keys are in the car or hung up in the office. John’s keys are not
in the car. Then John’s keys are hung up in the office.”
We can always write this argument in a clear and precise formal expression, such
as:
John’s keys are in the car or hung up in the office P
John’s keys are not in the car ¬P
Therefore, John’s keys are hung up in the office Q
We can express the argument in a more formal form
i. P νQ Assumptions
ii. ¬P
iii, Q Conclusion

Truth, Validity, and Soundness: probably the three most important concepts of the course.
First, let us briefly characterize these concepts. Truth: a property of statements, i.e., that they are
the case. Validity: a property of arguments, i.e., that they have a good structure.
(The premises and conclusion are so related that it is absolutely impossible for the premises to be
true unless the conclusion is true also.)

10
Soundness: a property of both arguments and the statements in them, i.e., the argument is valid
and all the statement are true. Sound Argument: (1) valid, (2) true premises (obviously the
conclusion is true as well by the definition of validity).
The fact that a deductive argument is valid cannot, in itself, assure us that any of the statements
in the argument are true; this fact only tells us that the conclusion must be true if the premises are
true.
EXPERT SYSTEM

Expert system is computer software that enables a significant portion of a specialized


knowledge of human expert in a specific narrow domain and emulates the decision-making
ability of the human expert.

Application Domain Area of Expert System

(1) Control (Air traffic) (6) Factory scheduling


(2) Monitoring (Nuclear plant) (7) Prediction (weather)
(3) Debugging (8) Instruction/Training
(4) Planning (mission planning) (9) Interpretation
(5) Medical diagnosis (10) Repair (Telephones, Cars)

ADVANTAGES and DISAVANTAGES

HUMAN EXPERT (HE) EXPERT SYSTEM (ES)

De-Merit HE 1. Perishable Permanent - Merit

2. Unpredictable Consistent

3. Slow reproduction Quick replication

4. Expensive Affordable

5. Slow processing Fast processing

______________________________________________________________________________

Meirt HE 1. Creative Lack inspiration- De-merit EE

2. Adaptive Needs instruction

11
3. Broad focus Narrow focus

4. Common sense Machine knowledge

EXPERT SYSTEM ARCHITECHTURE

According to James (1991): Expert system consists of 6 components; USER INTERFACE,


INFERENCE ENGINE AND EXPLANATION FACILITY SYSTEM

USER

USER INTERFACE EXPLANATION FACILITY

KNOWLEDGE LIPDATE

FACILITY

KNOWLEDGE

INTERFACE
ENGINE
EXPERT SYSTEM ARCHITECTURE

1. USER: A particular person that can execute ES, gained insight into their particular
interest with a minimum assistance; user can operate in several modes.

(1) Tester (User) : user that make attempt to verify the validity of the system behavior
(2) Tutor (User): user that provides additional knowledge to the system or modifies
already present knowledge in the system.

12
(3) Student (user): User who seek to rapidly develop expertise relative to the subject
domain by extracting organized, distilled knowledge from the system.
(4) Customer: user who applies the system expertise to a specific real task.
2. USER INTERFACE FACILITY: it accepts information from the user and translate it into
the form that is acceptable to the remaining system components or from the system and
convert it to human understandable manner.
The facility consists off a natural language processing system that accepts and returns
information in essentially the same form as that accepted by human expert.
3. KNOWLEDGE BASE
This represent information store house of the primitives (i.e basic facts, procedure rules
and heuristic) available to the system that can be used by the knowledge manager to
interpret the current contextual data in the situation model. The knowledge stored base
allows the system to act like an expert. The knowledge is stored in the firm of facts and
rules.
4. INFERENCE ENGINE: this is the software system that locates the knowledge and refers
new knowledge from the base knowledge. The engines inference paradigm is the search
strategy is the used to develop regular knowledge. The paradigm is one of the two
concepts below.
1. Backward chaining: it is top-down reasoning process that start from the desired goal
and works backwards towards requisite conditions.
2. Forward chaining: this is bottom-top that starts from known condition and works
towards the desired goals.

5. KNOWLEDGE UPDATE FACILITY : The accurate reflection of the domain at the time
of the system place in the service. Knowledge in many complex domain can be expanded
and change and knowledge base can be modified correspondingly. The facility is used to
perform such update.
The three basic form of are;
(1) Manual knowledge updates: The update is done by the knowledge engineer who
interprets information provided by a domain expert and update the knowledge base
using limited knowledge update system.
(2) State of art in expert system: The domain expert directly enters revised knowledge
without engineer’s knowledge meditation.
(3) Machine learning: New knowledge is generated by the system itself which is based
on generations of the past experience.
6. EXPLANATION FACILITY SYSTEM

13
Beyond simply reaching conclusion when faced with complex problem, the system is
capable to explain to some extent the reasoning that leads to the conclusion.
CHARACTERISTICS OF ES
These characteristics distinguish ES running on a computer system from traditional
complication application.
(1) One area knowledge: An ES will relate to one particular area of expertise or
knowledge rather than a set of data
(2) One particular purpose: It is constructed for a particular purpose e.g giving advice,
particular topic.
(3) Rules: knowledge is usually informed of rules.
(4) Inference: Knowledge and Inference are separate. Inference engine cause associated
to any knowledge base.
(5) Extendable: Knowledge can be extended; it can start off fairly small and later enlarge
in a controlled way.
(6) Handle Uncertainty: It can be that we uncertain about the world. We cannot be
certain, events are absolutely true or events are definitely going to happen. An ES
allows us to cope in these uncertainties.
(7) Give Advice: An ES replaces an expert. Therefore, ES is constructed to give advice
rather than answers.
(8) Explanation: It can explain its reasoning.
CURRENT STATE OF AN ES

According to Davis (1985), the current state of an expert system are classified into three

(1) Assistant (2) colleague (3) Experts


(1) Assistance: A knowledge base system that perform an economically valuable but
technically limited subset of an expert task. Many of these are personal computer based.
(2) Colleague: A medium knowledge based system that perform a significant subset of an
expert task. They are both implemented on PC and sub large platform e.g specialized
workstation and conventional mainframe.
(3) Experts: a large scale knowledge based system that approaches a level of performance of an
expert with a given domain. They are commonly implemented on a powerful platform by
using sophisticated developed tools.
Limitation of most existing ES
(1) Knowledge are acquired from small number of experts
(2) Application are to a limited specific domain
(3) Application domain must have little need for a temporal reasoning

EXPERT SYSTEM PROBLEM-SOLVING STRATEGY

14
Problem-solving and organizational concepts are used as a framework for combining the basic
components of ES into a complete system.

EXHAUSTIVE SEARCH

Simple exhaustive search is a direct application of search to every possible state. It is an


organized procedure for solving all possibilities. It is applied when a problem is immediately
tractable. An immediate tractable problem is one that into a reasonably small problem space, no
need of back-tracking to retract mistakes.

LARGE SEARCH SPACE

For most real world system, space is too large to allow exhaustive search, large space adopts
these two methods.

(1) Development of an efficient process for dealing with large space


(2) Transform the space into a more manageable form; divide into small components

GENERATE AND TEST

It is possible to structure search as generate and test. This type is a depth search that is used to
perform classical reasoning by elimination. It relies on generator that develops complex
candidate solutions and evaluator that test each proposed solution by comparing it with require
state. This process of generation followed a continues evaluation until a solution is discovered.

CLASSIFICATION MODEL

This is a framework for structuring rules and is widely applied, especially for divination and
interpretation task. This is used to organize reasoning (from observation to conclusion) based on
classification. The selection of conclusion from a list of pre-defined possible conclusion is
implemented as modified divination system in a knowledge base, control and working
memory.

(1) Knowledge base: this model consists of

15
(a) A list of possible observation (it includes the initial observed conditions and
findings that result from the test or experiment that is executed to gather
information.
(b) A set of rules that relates observation to conclusion.
(2) Control: The primary role of control segment is order of the collection of evidence.
(3) Working memory: A global memory that stores initial observations. Findings that
have been made and conclusions that have been reached.
Two general type of rule those in classification model are:
(1) Evidence to conclusion: these rules are used to list conclusion that are indicated by
evidence.
(2) Conclusion to evidence: This type of rules deserves the evidence that should be
present if given conditions exist.

2) ARTIFICIAL NEURAL NETWORK (ANN)

Artificial neural network is a massive parallel distributed processor that is made up of simple
processing units. It has the ability to learn from experiential knowledge that is processed through
inter unit connection strengths and make such knowledge available for use.

 ANN derives its computing power Q


(i) A massive power distributed structure

(ii) Ability to learn and therefore generalized knowledge

Generalization of ANN is deducing a reasonable output for new input that was not encountered
during learning process.

PROPERTIES OF ANN

1. LINEARITY: ANN can be linear basically but the entire ANN is nonlinear in the sense
that it is distributed about the network.
2. LEARNING FROM EXAMPLE: ANN modifies its interconnection weights by
applying a set a training or learning samples to the problem at hand.

16
3. ADAPTIVITY: ability to adapt its interconnection weight to changes in the surrounding
environment.
4. FAULT-TOLERANCE: It has potential of inherently fault-tolerance or capable of robust
computation.
5. UNIFORM AND ANALYSIS OF DESIGN: ANN enjoys universality as information
processors.

MODEL OF AN ARTIFICIAL NEURON


An artificial neuron is an information processing unit that is fundamental to the operation
of an ANN.
Basic elements of the model
(i) A set of connecting links from different inputs xi (or synapse); each of which is
characterized by a weight wki ( the weight could be (-ve or +ve).
(ii) An adder for summing the inputs signals xi weighted by the synaptic strengths
wki
(iii) An activation function for limiting the amplitude of the output yk of a neuron.

wki

net F(net) Yk

wkm

bk

where the bk is a bias mode of an artificial neuron (linear model)

The bias (bk) has the effect of increasing or lowering net input of the activation function,
depending on whether –ve or +ve.

In mathematical terms, an artificial neuron is an abstract model of a natural neuron.

NOTATIONS

17
Input-----xi, i= l…….m
Weight-----wki, K is the index of a given neuron in a ANN.
The weight simulate the biological synaptic strengths of natural neuron
Netk=X1wk1+ X2WK2+….+ xmWkm + bk=l=1m XiWKi
The same sum can be expressed in vector notation as a seal or product of two
dimensional vectors.
Netk=X1W
Where X=X0,, X1 X2,…..,XM
W=Wk0, Wk1,….WkM
Yk(output)= f(netk)

Activation functions input relation Graph

1. Hard limit y=1, if net ≥ 0 1

0, if net  0 0
2)
Symmetrical y= 1, if net ≥ 0
Hard limit -1, if net  0 1

-1

3 Linear y= net

3. Saturating linear y= 1, if net  1 net if 0 net  1


0, if net  0

Symmetric saturating 1, if net 1 net, 1

Linear if y net 1 y, if

Net  -1
4. Log-symoid y= 1 1
net
(1+e )

18
5. Hyperbolic tangent y=(enet – e-net) +1
Symoid (e e-net)
net +
-1

For example,
X1=0.5 0.3
X2=0.5 0.2 y
lf

X3=0.2 0.5

b=-0.2

(1) Symmetrical hard limit


Net= i-1m x I wki + bk

=0.5 x 0.3 + 0.5 x 0.2 + 0.2 x 0.5 + (-0.2) Solutions

= 0.15
Net = 0.15
Y= +1, if net ≥ 0
-1, if  0
Y= f (net)
Y=f (0.15)
Y= 1
(2) For saturating linear
Y= 1, if net 1 net
-1, if  0\
Y= f (0.15)
Y= 0.15
(3) For log-symoid
Y= 1
1+ e- net
Y= 1
1+e-0.15
Y= 0.537 =0.54
X1=1

19
1

2 3
0.2 y1 1.0

-0.6

0.5 y2 y3

X2=0.5 -1.0 -0.5

Y2 net1 = x1wk1 + x2wk2

=1 x 0.2 + 0.5 x 0.5

Net1 =0.45

Y= 1, if net  1 net, if -1 net  +1

-1, if net  -1

Y1 = +(0.45) = 0.45

Y2= x1wk1 + x2wk2

Net2= 1 x 0.6 + 0.5 x (-1.0) =-1.1

= -1.1, y= -1

Net3 = 0.45 x 1.0 + 0.1 x 0.5

Net3 = 0.45 + 0.5 = 0.95

Net = 0.95

Y= f (net) = f (0.95), then Y= 0.95

ARCHITECTURE OF ANN (A-ANN)

A-ANN is defined by the characteristic of a node and the characteristics of nodes connectivity in
the network. Network architecture is specified by the number of inputs to the network, the

20
number of outputs, the total number of elementary nodes that are usually equal processing
elements for the entire network, and their organization and interconnections.

The two types of ANN interconnections are

(1) Feed forward and recurrent

x1 Hiddenlayers1 hidden layer2 output layer

x2

Y1

output

x3 y2

feed forward network

input
delay

x1

output y1

x2

21
y2

xn RECURRENT NETWORK

LEARNING PROCESS

Learning is a process by which the free parameters of a neural network are adapted through the
process of stimulation by the environment in which the network is embedded.

The type of learning is that by the manner in which the parameter change.

 Learning algorithm is a set of rules for the solution of a learning problem.


Factors to be considered in learning process
1. Learning algorithm
2. Manner in which ANN architecture is built.

x1

Wk1

x2 wk2 F(ne yk
t)

xm wkn

inputs bk output

xk1, xk2…..xkm

processing the inputs vector (xn), a neuron k produces the output that is denoted by yk(n)

yk = f(i=lm XiWKi)---- it represent the only output of this simple network, and it is compared to
a desire response or target output dk(n), an error ek(n) produced at the output is by definition
ek(n)= dk(n) – yk(n) the error signal produced actuates control mechanism of the learning
algorithm. The objective is to apply a sequence of corrective adjustments to the input weights of
a neuron, to make the yk(n) output more closer to the target output dk(n) the objective is
achieved by minimizing a cost function E(n), E is the instantaneous value of error energy,
defined for this simple example in terms of error ek(n); E(n)= ½ e2k(n)

22
Based on minimization ,the learning process is referred to as error-connection learning.
Minimization of E(n) leads to learning rule common referred to as the delta or widrow-hoffrule.

Let Wkj(n) be the value of the weight factor for neuron K by input Xj(n) at the time step n

So Wkj is defined by DWkj(n)=  ek(n) . Xj(n)

---+ve constant that define the rate of learning .therefore, the delta may be stated as: the
adjustment made to a weight factor of an input neuron connection is proportional to the product
of the error signal and the input value of the connection in question.

Now, the update value is now define by

Wkj(n +1) = Wkj(n) +Wkj(n).

Example

X1 0.5

x2 -0.3 y
zif
0.8

x3 b=0

N(sample) X1 X2 X3 D
1 1 1 0.5 0.7
2 -1 0.7 -0.5 0.2
3 0.3 0.3 -0.3 0.5

Net (1) = 0.5 x 1 + (-0.3) x 1 + 0.8 x 0.5 + 0 = 0.6

Y(1) = f(0.6)= 0.6---- linear

E(1)= dk-yk = d(1) – y(1) = 0.7 – 0.6 = 0.1

Adjusted value: Wkj(n)= .ek(n) . Xj (n)

23
Wi(i) = .Eii . Xi(i)

= 0.1 x 0.1 x 1 = 0.01 ------Wi(2)= Wi(i) +Wi(i) = 0.5 + 0.01= 0.51

W2(1)= 0.1 x 0.1 x 1= 0.01-----W2(2)=w2(1) +W2(1)= 0.3 +0.01 =0.29

W3(1)= 0.1 x 0.1 x 0.5 = 0.005-----W3(2)=W3(1)= 0.8 + 0.005 = 0.805

When N=2

Net =l=2n x 2 x W2i= -1 x 0.51 + 0.7 x (-0.25) + (-0.5) x 0.305

Net= -1.1155, y= f(net)= -1.1155

Ek(n)=dk –yk= 0.2 –(-1.1155) = 1.3155

Adjustment

W2(1)= J.e2(1) x X2(1).

1(1)= 0.1 x 1.3155 x (-1) = -0.132; W1(2)= W1(1) +W1(1)=0.51 + (-0.132)=0.378

2(1)= 0.1 x 1.3155 x 0.7= 0.092; W2(2)= -0.29 + 0.092 = -0.19

3(1)= 0.1 x 1.3155 x -0.5= -0.0658=0.07; W3(2)= 0.805 + -0.07= 0.735

When n=3

Net = Xj(n)Wk(n)

Net = 0.3 x 0.378 + 0.3 x -0.19 + (0.3) + 0.735 = -0.164

Y= f(net) = -0.164 -----linear fxn

Ek(n) = dk(n) –ye(n) = 0.5 – (-0.164)= 0.664

W1(1)= .ek(3).Xj(3)= 0.1 – 0.664 x 0.3= 0.019 = 0.02

W1(2)= 0.1 X 0.664 X -0.3 = -0.02;W2(2)= -0.19 + 0.02= -0.17

3(1)= 0.1 X 0.664 X -0.3 = - 0.02; W3(2)= 0.735 + (-0.02) = 0.715

24
LEARNING TASK

The choice of a particular learning algorithm is influenced by learning task an ANN is required
to perform.

About six learning task for ANN

1. Pattern Association
Pattern association takes one of the two forms: auto association or hetero association.
In auto association; an ANN is required to store a set of pattern by repeatedly presenting
them to the network. It involves the use of unsupervised learning
In hetero association, an arbitrary set of input patterns is paired with another arbitrary set
of output patterns. It is involved with the use of supervised learning
Two phases in the Application of ANN for pattern Association
i. The storage phase; the training of network in accordance with given patterns.
ii. The recall phase; retrieval of a memorized pattern in response to noisy or distorted
version of a key pattern to the network.
2. Pattern Recognition: an ANN performs pattern recognition by first undergoing training
session; during which the network is repeatedly presented a set of input patterns along
with the category to which the patterns belong later, in a testing phase, a new pattern is
presented to the network, that it has not seen befire, but E belongs to the same population
of patterns used during training. The network is able to identify the class of the particular
ppattern because of the information it has extracted from the training data.
3. Function approximation
4. Control
5. Filtering.

GENETIC ALGORITHM

Genetic algorithm is a computer program that mimics the behavior of biological evolution
process in order to solve problems and to model evolutionary system.

DEFINITIONS OF TERMS

1. Chromosomes: it is a solution space encode into the binary-bit string.


2. Genes: they are features in the string of chromosome
3. Locus: position in the string
4. Genotype: string structure of chromosome
5. Phenotype: set of characteristics (features)

25
G.A is a general purpose optimization tools.

Application area for optimization problems of G.A

1. Wire routing 2. Scheduling 3. Adaptive control 4. Game playing

5. Transportation problem 6. Travelling salesman problem 7. Database query optimization


8. Machine learning

CHARCTERISTICS OF GENETIC ALGORITHM

(1) G.A are parallel search procedures that can be implemented on a parallel processing
machines for massively speeding up their operations.
(2) G.A are applicable to both continuous and discrete optimization problem.
(3) G.A are stochastic and less likely to get trapping local minimal, which inevitably are
present in any practical, optimization problem.
(4) G.A flexibility facilitates both structure and parameter identification in complex models.

MAIN STEPS/PHASES OF G.A


(1) Encoding schemes and initialization

G.A starts in designing a representation of a solution for a given problem. A solution, any value,
i.e a candidate for a correct solution that can be evaluated. e.g suppose we want to maximize
function y= 5- (x-1)2. Then X=2 in a solution, x= 2.5 is another solution, and x=3 is a correct
solution of the problem that maximizes Y.

The representation of each solution for genetic algorithm is up to the designer. The most
common representation of a solution is as a string of characters i.e a string of codes for features
representation where characters belong to a fixed alphabet. The larger the character, the more the
information cause represented by each character in a string .

The encoding process transforms point in a features space into bit-string representation. For
instance, a point (11,6,9)in a three dimensional feature space, with range (0,15) for each
dimension can be represented as a concatenated binary sting (11,6,9) = (101101101001). Others
include gray coding.

26
Encoding schemes provide a way of translating problem-specific knowledge directly into G.A
framework. In G.As, we are manipulating set of chromosome (a set of all featured values
encoded into a bit string represent a chromosome) called a population.

To initialize a population, we can simply set some population-size number of chromosome


randomly.

(2). Fitness evaluation

After creating a population, we calculate the fitness Value of each member in the population
because each chromosome is a candidate for an optimal solution. For optimization problem, the
fitness value fi of the ith member is usually the objective function evaluated at this member.

The fitness of a solution is a measure that can be used to compare solutions to determine which is
better.

The fitness value may be detected from complex analytical formula simulation model, or by
observations from experiments or real-life problem settings

(3). Selection

Selections deal to creating a new population from the current generation, the selection operation
detect parent chromosomes participate in producing offspring for next generation. Members are
selected to a probability proportional to their fitness values the common way mostly use to
implement this method is to set the selection probability P equal to Pi=Fi/nk=1fk, where n is
population size, fi is the fitness value for ith chromosome

(4). Crossover

It is a genetic algorithm operator that combines (metes) two chromosomes (parents) to provide a
new chromosome (offspring). The idea behind crossover is that the new chromosome may be
better than both of the parents if it takes the best characteristics from each of the parents. We
define the chromosomes for crossover in a current population using the following literature
procedure. The step 1 and 2 have to be repeated for all chromosome

27
1. Generate a random R from the range0,1
2. If R probability crossover , then select the given chromosome for crossover.
TYPES OF CREOSSOVER
(1) ONE POINT: A randomly selection of a crossover point within a chromosome then
interchanges it two parent chromosome at this point to produce new offspring e.g

Parent 1: 11001010 ------- 11001111


Parent 2: 00100111 -------- 00100010

(4). Genetic operators that alter the composition of offspring,

(5). Values for the various parameters that the genetic algorithm uses (population size, rate of
applied operators, e.t.c).

For example, optimize f(x) = x2, the task is to find x from the range 0,31 which maximize
the function f(x). We compare the result of analytical optimization in a G.A and find the
optimal solution. So represent the value of x in binary within the given range. Minimum five-
bit code (string) to accommodate the range with regular precision.

(b-a)  required precision ----1

2m-1

(b-a) 1

(2m-1)

(31-0)  1 ----(31-0)  2m-1

(2m-1) 31 + 1  2m – 1+1

32  2m

2m ≥ 25

M≥5

28
Code = binary ( X decimal)

X = decimal (code binary)

We now randomly create population of chromosome with the given length. Assuming we
decided only four parameter of X string in the population size

Then, possible randomly selected population of chromosome in

CR1 = 01101 ------- X1 (CR1) = 13 ----------------- f (X1) = 169

CR2 = 11000 X2(CR2) = 24 f(X2) = 576

CR3 = 01000 X3(CR3) = 8 f(X3) = 64

CR4 = 10011 X4(CR4) = 19 f(X4) = 361

(ii). Two- point

A crossover operator that randomly selects two-crossover points within a chromosome then
interchange the two parent chromosomes between these points to provide two new offspring

Parent 1: 1101010110 -------- 11000110

Parent 2: 1001001111 -------- 10001011

Others are 3. Uniform 4. Arithmetic 5. Heuristic

(5). MUTATION

A G.A operator used to maintain genetic diversity from one generation of a population of genetic
algorithm chromosomes to the next. The most common way to implement mutation is to flip a
bit with a probability equal to a very low, given mutation rate (MR). Mutation operator may
prevent any single bit from converging to a value through the entire population and importantly,
it can prevent the entire population from converging and stagnating at any local optima. e.g

randomly selected after chromosome mutation

29
mutation bit

10011110 --------------------- 10011100

Types of mutation

1. Bit string mutation 2. Flip bit 3. Boundary 4. Non uniform 5. Uniform


e.t.c

The phases of G.A as explained above in a flowcharts

Encoding scheme

Fitness evaluation yes


Testing the end of the algorithm
no halt

Parent selection

Crossover operator

Mutation operator

ILLUSTRATION OF G.A

To apply a genetic algorithm for a particular problem, we have to define or to select the
following five components:

1. A genetic representation or encoding scheme for potential solution to the problem.


2. A way to create an initial population of potential solutions.
3. An evaluation function that plays the role of environment rating solutions in terms of
their fitness.

Evaluation of the initial population


30
CR1 CODE X F(x) F(x)/f(x) Expected
reproduction
1 01101 13 169 0.144 0.58
2 11000 24 576 0.492 1.97
3 01000 8 64 0.055 0.22
4 10011 19 361 0.309 1.23
 1170 1.000 4.00
Ave 293 0.250 1.00
Max 576 0.492
1.97

CR2 and CR4 are more likely to be reproduces in the next generation than CR 1 and CR3.
CR4 is the best of the four chromosomes.
Using roulette wheel for selection of the next population for this problem, population size
-4
Probability of crossover, PC =1
Probability of mutation, PM= 0.001
Suppose that these are randomly selected pairs
CR1-CR2 and CR2-CR4 and crossover in at 3rd position.
1st pair CR1 ------ 01101 01100 = 12
CR2 ------- 11000 11001 = 25

2nd pair CR2 -------11000 11011 = 27


CR4 --------10011 10000 = 16

SECOND EVALUATION (ITERATION)

Cri CODE X F(x) F(x)/f(x) expected reproduction f(x)/fav

1 01100 12 144 0.08 0.33

2 11001 25 625 0.36 1.42

3 11011 27 729 0.42 1.66

4 10000 16 256 0.15 0.58

31
 1754 1.00 3.99= 4.00

Ave 439 0.25 0.99 = 1.00

Max 729 0.42 1.66

 1 = 1170 ----  2 = 1754

Ave1 = 293 ------ Ave2 = 439

Max1 = 576 ------ Max = 729

From the 1st and 2nd evaluation iteration chromosome 11011 0f X = 27 chose the maximum
of f(x), for  x = 31 is 961.

FUZZY SETS AND FUZZY LOGIC

DEFINITION:

Fuzzy logic enables us to handle uncertainty in a very intuitive and natural manner. In addition to
making it possible to formalize imprecise data, it also enables us to do arithmetic and Boolean
operations using fuzzy sets. Finally, it describes the inference systems based on fuzzy rules.
Fuzzy rules and fuzzy reasoning processes, which are the most important modeling tools based
on the fuzzy set theory, are the backbone of any fuzzy inference system. Typically, a fuzzy rule
has the general format of a conditional proposition. A fuzzy If-then rule, also known as fuzzy
implication, assumes the form

If x is A then y is B

where A and B are linguistic values defined by fuzzy sets on the universes of discourse X and Y,
respectively. Often, "x is A" is called the antecedent or premise, while "y is B" is called the
consequence or conclusion. Examples of fuzzy If-then rules are widespread in our daily
linguistic expressions, such as the following:

1. If pressure is high, then volume is small.

32
2. If the road is slippery, then driving is dangerous.

3. If a tomato is red, then it is ripe.

4. If the speed is high, then apply the brake a little.

Before we can employ fuzzy If-then rules to model and analyze a fuzzy reasoning-process, we
have to formalize the meaning of the expression "If x is A then y is B", sometimes abbreviated in
a formal presentation as A → B. In essence, the expression describes a relation between two
variables x and y; this suggests that a fuzzy If-then rule be defined as a binary fuzzy relation R
on the product space X × Y. R can be viewed as a fuzzy set with a two-dimensional membership
function:

A fuzzy sets express the degree to which an element belongs to a set.

The characteristics function of a fuzzy set is allowed to have values between 0 and 1, which
denotes the degree of an element in a given set.

If X is a collection of objects denoted generally by X, then a fuzzy set A in X is defined as a set


of ordered pairs.

A= (x, xA (x)/xX, XA is called membership function (MF).

Let X = San Francisco, 0.9), (BOSTON, 0.8), ( Los Angeles, 0.6)

Example 2: let X = 0,1,2,3,4,5,6 be a set of numbers of children a family may choose to have.
Then the fuzzy set A= “ Sensible number of children in a family” may be described as following.

A = (0, 0.1), (1, 0.3), (2, 0.7), (3, 1), (4, 0.7), (5, 0.3), (6, 0.1)

Or in the formal notation : A = 0.1/ 0 + 0.3/ 1 + 0.7/ 2 + 1.0/ 3 + 0.7/ 4 + 0.3/ 5 = 0.1/ 6

In a real world application of fuzzy sets, the shape of MFs is usually restricted to a certain class
of function which can be specified is only a few parameters. The most well-known are triangular,
trapezoidal, and Guussian

33
A B C

TRIANGULAR

A B C D

TRAPEZOIDAL

34
GAUSSIAN

Fuzzy logic is a theory which relates to classes of objects with un-sharp boundaries in which
membership is a matter of degree.

DATA MINING CONCEPTS

Data mining is an iterative process within which progress is defined by discovery, through either
automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in
which there are no predetermined notions about what will constitute an "interesting" outcome.
Data mining is the search for new, valuable, and nontrivial information in large volumes of data.
It is a cooperative effort of humans and computers. Best results are achieved by balancing the
knowledge of human experts in describing problems and goals with the search capabilities of
computers.

In practice, the two primary goals of data mining tend to be prediction and description.
Prediction involves using some variables or fields in the data set to predict unknown or future
values of other variables of interest. Description, on the other hand, focuses on finding patterns
describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining
activities into one of two categories:

1. Predictive data mining, which produces the model of the system described by the given
data set, or
2. Descriptive data mining, with produces new, nontrivial information based on the
available data set.

On the predictive end of the spectrum, the goal of data mining is to produce a model, expressed
as an executable code, which can be used to perform classification, prediction, estimation, or
other similar tasks. On the other, descriptive, end of the spectrum, the goal is to gain an
understanding of the analyzed system by uncovering patterns and relationships in large data sets.
The relative importance of prediction and description for particular data-mining applications can
vary considerably. The goals of prediction and description are achieved by using data-mining
techniques, for the following primary data-mining tasks:

35
1. Classification – discovery of a predictive learning function that classifies a data item into
one of several predefined classes.
2. Regression – discovery of a predictive learning function, which maps a data item to a
real-value prediction variable.

3. Clustering – a common descriptive task in which one seeks to identify a finite set of
categories or clusters to describe the data.

4. Summarization – an additional descriptive task that involves methods for finding a


compact description for a set (or subset) of data.

5. Dependency Modeling – finding a local model that describes significant dependencies


between variables or between the values of a feature in a data set or in a part of a data set.

6. Change and Deviation Detection – discovering the most significant changes in the data
set.

The more formal approach, with graphical interpretation of data-mining tasks for complex and
large data sets and illustrative examples, is given in Chapter 4. Current introductory
classifications and definitions are given here only to give the reader a feeling of the wide
spectrum of problems and tasks that may be solved using data-mining technology.

The success of a data-mining engagement depends largely on the amount of energy, knowledge,
and creativity that the designer puts into it. In essence, data mining is like solving a puzzle. The
individual pieces of the puzzle are not complex structures in and of themselves. Taken as a
collective whole, however, they can constitute very elaborate systems. As you try to unravel
these systems, you will probably get frustrated, start forcing parts together, and generally become
annoyed at the entire process; but once you know how to work with the pieces, you realize that it
was not really that hard in the first place. The same analogy can be applied to data mining. In the
beginning, the designers of the data-mining process probably do not know much about the data
sources; if they did, they would most likely not be interested in performing data mining.
Individually, the data seem simple, complete, and explainable. But collectively, they take on a
whole new appearance that is intimidating and difficult to comprehend, like the puzzle.

36
Therefore, being an analyst and designer in a data-mining process requires, besides thorough
professional knowledge, creative thinking and a willingness to see problems in a different light.

Data mining is one of the fastest growing fields in the computer industry. Once a small interest
area within computer science and statistics, it has quickly expanded into a field of its own. One
of the greatest strengths of data mining is reflected in its wide range of methodologies and
techniques that can be applied to a host of problem sets. Since data mining is the entire data
warehousing, data-mart, and decision-support community, encompassing professionals from
such industries as retail, manufacturing, telecommunications, healthcare, insurance, and
transportation. In the business community, data mining can be used to discover new purchasing
trends, plan investment strategies, and detect unauthorized expenditures in the accounting
system. It can improve marketing campaigns and the outcomes can be used to provide customers
with more focused support and attention. Data-mining techniques can be applied to problems of
business process reengineering, in which the goal is to understand interactions and relationships
among business practices and organizations.

Many law enforcement and special investigative units, whose mission is to identify fraudulent
activities and discover crime trends, have also used data mining successfully. For example, these
methodologies can aid analysts in the identification of critical behavior patterns in the
communication interactions of narcotics organizations, the monetary transactions of money
laundering and insider trading operations, the movements of serial killers, and the targeting of
smugglers at border crossings. Data-mining techniques have also been employed by people in the
intelligence community who maintain many large data sources as a part of the activities relating
to matters of national security. Appendix B of the book gives a brief overview of typical
commercial applications of data-mining technology today.

VISUALIZATION METHODS

Visualization is defined in the dictionary as "a mental image". In the field of computer graphics,
the term has a much more specific meaning. Technically, visualization concerns itself with the
display of behavior and, particularly, with making complex states of behavior comprehensible to
the-human eye. Computer visualization, in particular, is about using computer graphics and other

37
techniques to think about more cases, more variables, and more relations. The goal is to think
clearly, appropriately, with insight, and to act with conviction. Unlike presentations,
visualizations are typically interactive and very often animated.

Because of the high rate of technological progress, the amount of data stored in databases
increases rapidly. This proves true for traditional relational databases and complex 2D and 3D
multimedia databases that store images, CAD (Computer-aided design) drawings, geographic
information, and molecular biology structure. Many of the applications mentioned rely on very
large databases consisting of millions of data objects with several tens to a few hundred
dimensions. When confronted with the complexity of data, users face tough problems: Where do
I start? What looks interesting here? Have I missed anything? What are the other ways to derive
the answer? Is there other data available? People think iteratively and ask ad hoc questions of
complex data while looking for insights.

Computation, based on these large data sets and databases, creates content. Visualization makes
computation and its content accessible to humans. Therefore, visual data mining uses
visualization to augment the data-mining process. Some data-mining techniques and algorithms
are difficult for decision-makers to understand and use. Visualization can make the data and the
mining results more accessible, allowing comparison and verification of results. Visualization
can also be used to steer the data-mining algorithm.

It is useful to develop a taxonomy for data visualization, not only because it brings order to
disjointed techniques, but also because it clarifies and interprets ideas and purposes behind these
techniques. Taxonomy may trigger the imagination to combine existing techniques or discover a
totally new technique.

Visualization techniques can be classified in a number of ways. They can be classified as to


whether their focus is geometric or symbolic, whether the stimulus is 2D, 3D, or n-D, or whether
the display is static or dynamic. Many visualization tasks involve detection of differences in data
rather than a measurement of absolute values. It is the well-known Weber's Law that states that
the likelihood of detection is proportional to the relative change, not the absolute change, of a

38
graphical attribute. In general, visualizations can be used to explore data, to confirm a
hypothesis, or to manipulate a view.

In exploratory visualizations, the user does not necessarily know what s/he is looking for. This
creates a dynamic scenario in which interaction is critical. The user is searching for structures or
trends and is attempting to arrive at some hypothesis. In confirmatory visualizations, the user has
a hypothesis that needs only to be tested. This scenario is more stable and predictable. System
parameters are often predetermined and visualization tools are necessary for the user to confirm
or refute the hypothesis. In manipulative (production) visualizations, the user has a validated
hypothesis and so knows exactly what is to be presented. Therefore, he focuses on refining the
visualization to optimize the presentation. This type is the most stable and predictable of all
visualizations.

The accepted taxonomy in this book is primarily based on different approaches in visualization
caused by different types of source data. Visualization techniques are divided roughly into two
classes, depending on whether physical data is involved. These two classes are scientific
visualization and information visualization.

Scientific visualization focuses primarily on physical data such as the human body, the earth,
molecules, and so on. Scientific visualization also deals with multidimensional data, but most of
the data sets used in this field use the spatial attributes of the data for visualization purposes; e.g.,
Computer-Aided Tomography(CAT) and Computer-Aided Design(CAD). Also, many of the
Geographical Information Systems (GIS) use either the Cartesian coordinate system or some
modified geographical coordinates to achieve a reasonable visualization of the data.

Information visualization focuses on abstract, nonphysical data such as text, hierarchies, and
statistical data. Data-mining techniques are primarily oriented toward information visualization.
The challenge for nonphysical data is in designing a visual representation of multidimensional
samples (where the number of dimensions is greater than three). Multidimensional-information
visualizations present data that is not primarily plenary or spatial. One-, two-, and three-
dimensional, but also temporal information-visualization schemes can be viewed as a subset of
multidimensional information visualization. One approach is to map the nonphysical data to a

39
virtual object such as a cone tree, which can be manipulated as if it were a physical object.
Another approach is to map the nonphysical data to the graphical properties of points, lines, and
areas.

Using historical developments as criteria, we can divide information-visualization techniques


(IVT) into two broad categories: traditional IVT and novel IVT. Traditional methods of 2D and
3D graphics offer an opportunity for information visualization, even though these techniques are
more often used for presentation of physical data in scientific visualization. Traditional visual
metaphors are used for a single or a small number of dimensions, and they include:

1. Bar charts that show aggregations and frequencies.


2. Histograms that show the distribution of variables value.

3. Line charts for understanding trends in order.

4. Pie charts for visualizing fractions of a total.

5. Scatter plots for bivariate analysis.

Color-coding is one of the most common traditional IVT methods for displaying a one-
dimensional set of values where each value is represented by a different color. This
representation becomes a continuous tonal variation of color when real numbers are the values of
a dimension. Normally, a color spectrum from blue to red is chosen, representing a natural
variation from "cool" to "hot", in other words from the smallest to the highest values.

With the development of large datawarehouses, data cubes became very popular information-
visualization techniques. A data cube, the raw-data structure in a multidimensional database,
organizes information along a sequence of categories. The categorizing variables are called
dimensions. The data, called measures, are stored in cells along given dimensions. The cube
dimensions are organized into hierarchies and usually include a dimension representing time.
The hierarchical levels for the dimension time may be year, quarter, month, day, and hour.
Similar hierarchies could be defined for other dimensions given in a datawarehouse. Multi-
dimensional databases in modern datawarehouses automatically aggregate measures across
hierarchical dimensions; they support hierarchical navigation, expand and collapse dimensions,

40
enable drill-down, drill-up, or drill-across, and facilitate comparisons through time. In a
transaction information in the database, the cube dimensions might be product, store, department,
customer number, region, month, year. The dimensions are predefined indices in a cube cell and
the measures in a cell are roll-ups or aggregations over the transactions. They are usually sums
but may include functions such as average, standard deviation, percentage, etc.

For example, the values for the dimensions in a database may be

1. Region: north, south, east, west


2. Product: shoes, shirts
3. Month: January, February, March,…, December

Then, the cell corresponding to [north, shirt, February] is the total sales of shirts for the northern
region for the month of February.

Novel information-visualization techniques can simultaneously represent large data sets with
many dimensions on one screen. Some possible classification of these new techniques are

1. Geometric projection techniques


2. Icon-based techniques

3. Pixel-oriented techniques

4. Hierarchical techniques

Geometric projection techniques aim to find interesting projections of multidimensional data


sets. We will present some illustrative examples of these techniques.

The Scatter-Plot Matrix Technique is an approach that is very often available in new data-mining
software tools. A grid of 2D scatter plots is the standard means of extending a standard 2D scatter
plot to higher dimensions. If you have 10-dimensional data, a lO × 10 array of scatter plots is
used to provide a visualization of each dimension versus every other dimension. This is useful
for looking at all possible two-way interactions or correlations between dimensions. Positive and
negative correlations, but only between two dimensions, can be seen easily. The standard display

41
quickly becomes inadequate for extremely large numbers of dimensions, and user-interactions of
zooming and panning are needed to interpret the scatter plots effectively.

42