You are on page 1of 30

1

One central element ofintelligent behavior is the ability to learn from


experience

Understand the principles ofClustering, Classification and Prediction

Explain therelationship between learning and data mining

Given examples ofSupervised learning (back propagation neural


network, and a decision tree ),unsupervised learning (Kohonen map ),
learning to rule-based systems using genetic algorithms

Examine the concept ofinformation theory in decision tree

2
Classification: Definition .1
Given a collection of records (training set ) Each record contains a set of*
.attributes , one of the attributes is theclass

Find a model for class attribute as a function of the values of other*


.attributes

assigned a class as accurately as Goal: previously unseen records should be*


. possible
test set is used to determine the accuracy of the model. Usually, A **
the given data set is divided into training and test sets, withtraining
.set used tobuild the model andtest set usedto validate it

3
Illustrating Classification Task .2

4
is the process of grouping the data into classes orclusters, so that objects within a
cluster havehigh similarity in comparison to one another but arevery dissimilar
to objects in other clusters. Dissimilarities are assessed based on the attribute
.values describing the objects. Often, distance measures are used
As a result: clustering means
Finding groups of objects such that the objects in a group will be similar (or related) to
.one another and different from (or unrelated to) the objects in other groups

5
is the most common form of learning and is sometimes calledprogramming by
example . Thelearning agent is trained by showing it examples of the problem state
or attributes along with the desired output or action. The learning agent makes a
prediction based on the inputs and if the output differs from the desired output, then
the agent is adjusted or adapted to produce the correct output. This process is
repeated over and over until the agent learns to make accurate classifications or
predictions. Example of this type oflearning back propagation neural network,
and adecision tree

6
.
Reinforcement learning is a special case of supervised learning where the exact desired
output is unknown. It is based only on the information of whether or not the actual
output is correct. And we can consider Reinforcement learning as a middle stage
between supervised learning and unsupervised learning. Such Self organization map

is used when thelearning agent needs to recognize similarities between inputs or to


identify features in the input data. The data is presented to the agent, and it adapts so
.that it partitions the data into groups

7
On-line learning means that the agent is sent out to perform its tasks and that it can
learn or adapt after each transaction is processed. On-line learning is like on-the-job
training and places severe requirements on the learning algorithms. It must bevery
fast and verystable

is more like a business seminar. You take your sales people off the floor and place
them in an environment where they can focus on improving their skills without
distractions. After a suitable training period, they are sent out to apply their new
found knowledge and skills

8
Data mining is the process of .1extraction knowledge hidden from large volumes
.of raw data

.2 data mining automates the process offinding relationships and patterns in a row
data and delivers results that can be either utilized in an automated decision
.support system or assessed by human analyst

.3 Data Mining is a process fordiscovering data relationships hidden in large


.databases

So learning, as applied to data mining, can be thought ofas a way for intelligent
agents to automatically discover knowledge rather than having it predefined
.using predicate logic, rules, or some other representation

9
Artificial neural network : Definition

is aninformation-processing system that able to acquire, store, and


utilize experiential knowledge has been related to the network’s capabilities and
.performance

Neural networks provide aneasy way to add learning ability to agents

10
Decision trees are one of the fundamental techniques used in data mining. They are tree-
like structures used forclassification, clustering, feature selection, and prediction. Its
-:have the following features

.Decision trees are easily interpretable and understand for humans.1

.They are well suited for high-dimensional applications .2

.Decision trees are fast and usually produce high-quality solutions.3

Decision tree objectives are consistent with the goals of data mining and .4
.knowledge discovery

A decision tree consists of aroot and internal nodes. The root and the internal nodes
are labeled with questions in order to find a solution to the problem under
.consideration

11
The root node is the first state of a DT. This node is assigned to all of the examples
from the training data. If all examples belong to the same group, no further decisions
need to be made to split the data set. If the examples in this node belong to two or
more groups, a test is made at the node that results in a split. A DT is binary if each
node is split into two parts, and it is non binary (multi-branch) if each node is split into
.three or more parts

A decision tree model consists of two parts:creating the tree andapplying the tree
to the database. To achieve this, decision trees use several different algorithms. The
. most widely-used algorithms by computer scientists are ID3, C4-5, andC5.0
In general, decision trees are based on information theory and the leaf nodes of it
represent a final classification of the record .

12
Given Examples (S); Target attribute (C); Attributes))R
Initialize Root
Function ID3))S,C,R
Create a Root node for the tree
IF S =;empty, return a single node with value Failure
IF S = C,;return a single node with C
IF R =;(empty, return a single node with most frequent target attribute (C
ELSE
BEGIN
Let D be the attribute with largest Gain Ratio )D, S) among attributes in;R
Let {dj\j = 1, 2 , . . . , n} be the values of attribute;D
Let {Sj\j = 1, 2 , . . . , n} be the subsets of 5 consisting respectively of records with value dj for attribute;D
Return a tree with root labeled D arcsd ,d-i ,..., dn;going respectively to the trees
For each branch in the tree
IF S =;empty, add a new branch with most frequent C
ELSE
ID3{S1,C,R-{D}), ID3{S2,C,R-{D}), . . . ,){ID3 )Sn, C,R-{B
ENDID3
Return Root

13
-:Example

14
15
16
17
18
(Positive example (P 70%
‫نفرض لدينا سوبرماركت يراد حساب ايراده الشهري علماً انه تم‬
‫ شخص‬1000 ‫زيارته من قبل‬ 350=70/100*500
Women=500
500 and women500 man -:‫الحالة الولى‬

(Positive example (P 30%


(Positive example (P
150=30/100*500
Examples
(negative example (N

(Positive example (P 90% I )p / )P + n) ,n / )p + n))= =I


Man=500 450=90/100*500 n)) log 2)P / )p + n)) - )n / )p + n)) log 2)n / )p + n)) ((1 p / (p +)

(Negative example (N % 10
Remainder =Pi + ni* I (2)
50=10/100*500

Gain =I- Remainder (3)

I= Total information =1

19
‫‪(Positive example (P‬‬ ‫‪90%‬‬
‫‪](Gain =1-](500\1000)I(450\500,50\500)+ (500\1000)I(350\500,150\500‬‬
‫=‪](I(0.9,0.1)+(0.5)I(0.7,0.3(0.5)[-1‬‬ ‫‪G2=333‬‬ ‫‪300=90/100*333‬‬
‫‪]I(0.9,0.1)=(-]0.9 log2 0.9 + 0.1 log2 0.1‬‬
‫= ‪0.468‬‬
‫‪]I(0.7,0.3) = -]0.7 log2 0.7 + 0.3 log2 0.3‬‬ ‫‪(Negative example (N % 10‬‬
‫= ‪0.8812‬‬ ‫‪33=10/100*333‬‬
‫‪](Gain= 1- ]0.5 (0.46899) + 0.5(0.8812‬‬
‫= ‪0.324857‬‬

‫‪(Positive example (P‬‬ ‫‪100%‬‬


‫الحالة الثانية ‪ -:‬تقسم الزبائن الى ثلث مجاميع حسب دخولهم الى السوبر ماركت‬ ‫‪G3=333‬‬
‫‪333=100/100*333‬‬
‫* المجموعة الولى تدخل اقل من ‪ 4‬ساعات شهري ًا وعددهم ‪ 333‬زبون‬
‫* المجموعة االثانية تدخل من ‪4‬الى ‪ 10‬ساعات شهرياً وعددهم ‪ 333‬زبون‬
‫•المجموعة الثالثة تدخل اكثر من ‪ 10‬ساعات شهري ًا وعددهم ‪ 333‬زبون‬ ‫‪(Negative example (N % 0‬‬

‫‪0=0*333‬‬
‫‪(Positive example (P‬‬ ‫‪50%‬‬

‫‪G1=333‬‬ ‫‪166=50/100*333‬‬ ‫‪](Gain= 1- ]0.333 I(0.5,0.5) + 0.333 I (0.9,0.1)+ 0.333 I(1,0‬‬


‫= ‪(0.466133+0.333*0.333+0.333)-1‬‬
‫=‪0.178778‬‬
‫‪(Negative example (N % 50‬‬

‫‪166=50/100*333‬‬

‫‪20‬‬
Back propagation is a .1systematic method for training multilayer artificial
neural network
its learning rule is generalized from .2Widrow-Hoff rule for multilayer
,networks
W’=w+ c (d - f( net )) xj
its is a very popular .3supervised model . in neural network
It .4does not have feedback connections, buterrors are Back propagated during
training, Least mean squared error is use

21
Step1:-Input initial values to learning rate (η 0), maximum acceptable error to network
(Emax), maximum number of epochs to learning network (Epochmax), momentum
(rate(α
Step2:-Put network error value (MSE) equal to zero and current training pattern error
equal to one
Step3 :-Computehidden neurons activity, by unipolar sigmoid function, with λ=1
according to equation

Step4 :-The hidden neuron outputs become inputs tooutput neurons that apply the
.same sigmoid function to activity hidden

22
Step5 :- Computeerror signal value to output neurons of pattern p

Step6 :- Computeerror signal value in hidden neurons depended on output neurons


error

.Step7 :- Adjust weights betweenhidden layer and output layer

23
Step8 :- Adjust weights betweeninput layer and hidden layer

Step9:- Increase value p by one to input the next pattern in learning process, if it does
not reach to maximum number to training patterns then return tostep3 to training
network on that pattern else transform tostep10

Step10:- After completing input to all training patterns to the network, compute cost
function value

Step11:- In this step, the termination criterion is tested. This condition is valid if the total error
value of network becomes less than the expected error of it (Emax), or the current Epoch value(t)
is bigger than maximum number of learning epochs (Epochmax). Else return.to step 2

24
 A Kohonen map is a single-layer neural network, comprised of an input layer and
an output Layer.
 Unlike back propagation, which is a supervised learning paradigm, feature maps
perform unsupervised learning.
 Each time an input vector is presented to the network, its distance to each unit in the
output layer is computed. Various distance measures have been used. The most
common and the one used here is just the Euclidean distance.
 The output unit with the smallest distance to the input vector is declared the
”winner.”

25
 Step1:- the inputs are presented to the input layer
 Step2:- the distance of the input pattern to the weights to each output unit is
computed using the Euclidean distance
 Step3:- The weights updates of only one node in the network called winner node.
The winner node has the weight vector similar to the current input. The weights
update
(w'= w + α(x-w
where w' is the altered weight vector, w is the weight vector of the winner node, x the
current input vector and the α is the small number, always between0 and 1,. The
updating of the weights continuous until the MSE (Mean Sequare Error) reaches an
.accepted threshold

26
Classifier systems are rule systems which use genetic algorithmsto modify
.the rule base
Genetic algorithms
are particularly suited tooptimization problems because they are, in essence,
performing aparallel search. in the state space

A .The crossover operator injects large amounts of noise into the process to make
.sure that the entire search space is covered

B. The mutation operator allows fine-tuning of fit individuals in a manner similar to


.hill-climbing search techniques

C. Control parameters are used to determine how large the population is, how
individuals are selected from the population, and how often any mutation and
.crossover is performed

27
28
Population:- -1 represent number of individuals in that environment, usually we used
(50-100) individual, each one have .consist of (L) gen
Chromosome i= gen1 gen2 …genl 1<=l<=popsize
Evaluation -2 :- fitness value has been allocated for each chromosome, this value
connected with objective function
The selection -3 of the fittest individuals ensures that only the best ones will be
allowed to have offspring, driving the search towards good solutions
By recombining -4 the genetic material of these selected individuals. The possibility
of obtaining an offspring where at least one child is better than
.any of its parents is high
Mutation -5 is mean to introduce new rails, not present in any of the parents. It is
usually performed on freshly obtained individuals by slightly altering
.some of their genetic material
Replacement criterion -6, is the last operation that basically says which elements
among those in the current gene-pool and their newly
generated offspring, are to be given a chance of survival on
.to the next generation

29
;(Initialization (population
;(Evaluation (population
;Gen 0
Do
;(Selection (population ,Selected parents
;(.Crossover (selected parents, created offspring, crossover Pc
;(.Mutation (created offspring ,Pm
;( Evaluation (created offspring
Gen gen+1
;(While (not stop_criteria
End SSGA

END

30