Professional Documents
Culture Documents
Mini Project Report
Mini Project Report
Submitted By:
I
Candidate’s Declaration
I have not submitted the matter embodied in this project for the award of
any other degree.
Place: Allahabad
Date: 17-5-2004
------------------------------------------------------------------------------------------------------------
Certificate
This is to certify that the above declaration made by the candidate is correct
to the best of my knowledge and belief.
II
Acknowledgement
III
Table Of Contents
Abstract………………………………………………………………..….I
Declaration………………………………………………………………..II
Certificate…………………………………………………………………II
Acknowledgements……………………………………………………….III
List Of Figures………………………………………………………… VII
CHAPTER I
Introduction and Statement Of Problem………………………………....1
1.1 Introduction…………………………………………………………1
1.2 Problem Statement………………………………………………….1
CHAPTER II
Challenges In This Field ..………………………………………………...3
2.1 Features extraction …..……………………………………………..3
2.2 Selection of an algorithm………..……………………………….…3
CHAPTER III
Approaches in This Direction…..………………………………………...4
3.1 s tatistical Approaches.. ……………………………………...4
3.1.1 Clustering algorithm Kmean ....…………………………….4
3.1.2 k-nearest neighbour……..…………………………………..4
3.2 Softcomputing………………………………………………………5
3.2.1 Genetic algorithm ……………………………………………5
3.2.2 Adaptive resonance theory (ART)….…………………….....6
3.2.3 Fuzzy C-mean………………………………………………...7
3.2.4. Gustavson kessel algorithm………………………………...8
3.2.5 Gath–geva algorithm…………………………………….…..8
3.2.6 Kohonen SOM…………………………………………….….9
IV
CHAPTER IV
System Architecture……...………………………………………………10
4.1 Data Source Name and login….…………………………………..10
4.2 Algorithm and table selection……………………………………..10
CHAPTER V
Results And Conclusions…………………………………………………12
5.1 Results……………………………………………………………...12
5.2 conclusion …………………………………………………………15
5.3 Future Extensions…….…..………………………………………….15
5.3.1 Improvement in the genetic algorithm…………………..………16
5.3.2.Distributed computing environment……………………….16
5.3.3.Dealing with various platform and format………………...16
References ……………………………………………………………….. 17
-Books ……………………………………………………………………17
- Research Papers………………………………………………………..17
V
List of Figures
VI
VII
Chapter 1
1. 1 Int r oducti on
“If we already know about the upcoming hazards; it is very easy to find
the way to abolish it.”
Here, this sentence is being described in the context of Landmine
Detection and Decontamination. My objective is to predict
whether at a particular point of working area is occupied by mines
or not, with some confidence parameter. Robot is designed to
move toward these predicted areas to decontaminate the mines.
These mines occupied area can be known before initiation of robot
movements or can be predicted dynamically, so to design an
obstacles free path for robot is another aspect beyond the domain
of this module.
To tackle this problem a classification toolkit has been designed using
some statistical and soft computing based approaches to cluster
the data, to predict the possible class of incoming data, to generate
some rules in the term of confidence parameter. The data may be
given in image form or some tabular form having all numeric or
categorized attributes.
It is impossible to design a classifier having 100% right classification
because it is not easy to differentiate between the data of metallic
debris, PVC tubes and actual mine data.
On the basis of this prediction path designers develop the obstacle free
path to decontaminate these mines.
1 . 2 S t a t e m e n t O F P r ob l e m
1
Anti-Personal landmines are a significant barrier to economic and social
development in a number of countries, so we need a classification
system that can differentiate a mine from metallic debris on the
basis of given data. This data is generated by some highly accurate
sensors.
2
Chapter 2
3
Chapter 3
3 . 1 . 2 K - n e a r e s t N e i g h b ou r
4
K-nearest neighbour technique is used to predict the class of
incoming data on the basis of given training data and density
estimator (k-nn) to estimate the confidence of the incoming
sample for a particular class. Finally the class is predicted having
the highest estimator.
3 . 2 . 1 G e n e t i c a l g o r i t h m t o e s t a b l i s h r u l es
T o e sta blish the rules between t h e a t t r i b u t e s o f d a t a
a sso c ia t i on ru le but a ssociation Rule mining cannot predict the
complete set of rules, i.e. the rules which have negation in the
attributes cannot be discovered. To overcome that disadvantage,
Genetic Algorithms (GAs) has been used.
F i r st o f a l l a ss o c i a t i o n r u l e i s a p p l i e d w i t h s o m e s u p p o r t a n d
c o n f id e n c e v a l u e s e n t e r e d b y u s e r t o ge n e r a t e s o m e b a s e r u l e s
5
and these r u les a re sent to ge n e t i c a l g o r it h m a s i n p u t w h i c h
h e l p s t o e v o l v e s o m e n e w r u l e h a v in g n e ga t io n i n a t t r ib ute s .
T h e t h r e e b a s i c p a r t o f g e n et i c a l g or i thm a r e a s f ol l ow :
(a)S el ec t i on: R ou l e t t e w h e e l t e c h n i q u e i s u s e d t o s e l e c t t h e t w o
parents [R1].
(b)C r os s ov e r : A r a n d o m p o i n t ( c r os s o v e r po int ) is gen erat ed a nd
t h e s e gm e n t t o t h e le ft o f t h i s p o i n t o f f i r s t p a r e n t a n d t h a t o f
second parent are interchanged.
(c)M u ta t i on: m ut a t i o n p o i n t i s g e n e r a t e d r a n d o m l y a n d t h e b i t
va lue a t this po i n t i s t o g g l e d .
A ft er so m e i tera t io n w e f i nd s om e r u les fo l l ow in g t h e a bo ve
p r o p e r t i e s a n d h a v in g h i gh f i t n e ss va lue that ca n be ca lculated
e i t h e r u s i n g t h e c o n fid en c e va lu e o r b y c o n fu s i o n m a t r i x .
3 . 2 . 2 A d a p t i v e r e s on a n c e t h e o r y (AR T )
As w e k no w backpropagation network is very powerful in the sense that
it can simulate any continuous function given a certain number of
hidden neurons and a certain forms of activation functions. But
once a back propagation is trained, the number of hidden neurons
and the weights are fixed. The network cannot learn from new
patterns unless the network is re-trained from scratch, so there is
no plasticity. [R2]
So ART is a new neural network technique to solve this problem.
Our ultimate objective is to cluster the data in several chunks.
Each time one by one samples from the data as input neurons is sent as
input and the activation value is calculated corresponding to each
of the existing output neurons, and the highest value is chosen ,if
this value is higher than threshold values then the weight of this
connection is updated otherwise a new output neuron is added.
After certain iteration it’s found that the proper clusters of the
data in our application don’t have classes more than two (mine and
non-mine). The another fact is that if a non-mine data is predicted
as mine it is acceptable but vice-versa is not true because it may be
6
dangerous, so among all the clusters, the cluster having the
cluster-center farthest from the mine data center is classified as
non-mine, rest of the clusters are classified as mine.
Here activation function is calculated as the city block distance of the
incoming normalized data and weights of connection.
3.2.3 Fuzzy c-mean:
I n t h e c la s s ic a l c l u s t e r in g a l gor it h m we h a v e t h e c r is p
m e m b e r s h i p o f a c l a s s ( e i t h e r o n e o r z er o ) . b u t w h i l e
cla ss i fy i ng t he m in e d a ta it is n o t v e r y e a s y t o d i f f e r e n t i a t e
b e t w e e n m i n e a n d n o n - m in e . S o w e n e e d a m et h o d t h a t c a n t e l l
t h e m em be r s h i p o f t h e d a t a i n e ac h cla ss . I f th is m e m ber sh i p
is a vera g e the n w e d ea l th is d a ta a s spe c ia l d a ta a n d c la s s if y
t h i s i n t h e c l a s s o f m i n e ( a s m in e a r e d a n g e r o u s ! ! ) . [ R 3]
where |X| is the feature vector
Membership
values
Euclidean distance:
7
Mean center
prototype(Ci)=
I f t h e d i f f e r e n c e o f t h e m e m be r s h i p v a l u e w i t h p r e v i o u s
memb ers hip va lu e is le ss tha n t h r e s h o l d t h a n a l g o r i t h m
ter m ina t e w it h h a v in g th e m em be r s h ip v a l u e f o r e a c h c la s s .
3 . 2 . 4Gustavson-Kessel Algorithm
It is an improvement of fuzzy c-mean clustering algorithm .the
correlation between the data is not considered in c mean. In this
algorithm we redefine our distance formula as: [R3]
Mahalobis distance :
3 . 2 . 5 Gath-Geva Algorithm :
This algorithm assumes that data is normally distributed. [R3]
Distance :
8
where is the a-priori probability of data belonging to cluster i,
9
Chapter 4
System Architecture
As I have already discussed that input can have image form or tabular
form .Matlab has been used to extract the features from the input
images. We have the numerical attributes based table with the
entry whether the data belongs to mine or non-mine, but for the
genetic algorithm categorized table is required so data is
categorized in three categories :Low, Medium and High with class
value simply mine or non-mine.
10
algorithm).Now the algorithm specific results will be
displaced.
Different algorithm can ask for some input parameter like clustering
algorithm can ask for number of cluster etc.
The interface is self explanatory with proper help. Java language has
been used at front hand and Microsoft Access XP for Database in
back hand and JDBC Bridge to communicate between algorithms
and databases.
13
Fig 5 : Result of Fuzzy c-mean and Gustavson kessel algorithm
Kmean Algorithm
Kohonen SOM
This algorithm is also used for clustering and it’s quite a fast algorithm
based on ‘winner take all’ strategy. It differentiates the mine and
non-mine up to 80% accuracy
Fig 8 : Result of Kohonen SOM algorithm
5.2 C o n clusi on
A l l t h e e i ght d if f e r e n t a lg or it h m s h a v e b e e n i m p l e m e n t e d t o
c o m pa r e t h e r e s u l t s . T h i s c l a s s i f ie r is givin g re su lt with 80%
a c c u r a c y . T h e b e s t re su lt is b e in g given b y AR T an d Ge ne tic
a l g o r i t h m . F u z z y C - m ea n a nd G us t a vs on k e sse l is a l so g ood
b e c a u s e o f m e m b e r s h i p v a lu e s f or ea ch cla ss. Th is modu le c a n
d if f e r e n t ia t e b e t w e e n t h e P V C t ub e , wo o d p ie c e , b r a s s t ub e
, c o p p e r c y l in d e r ( N o n m in e d a t a) a n d t h e m i n e d a t a o b t a in e d
f r o m j r c I s r a e l ( h t t p : / / a p l - d a t a b a se . j r c . i t ) .
15
5. 3 F u t u r e E x t e n s i on
We contemplate following future features which can be incorporated
into this project:-
5.3.1 Im prov em ent in th e gen eti c a l gorit h m :the implemented
genetic algorithm in this module incorporates only point mutation,
so the other type of mutation can also be practiced like deletion
,insertion and segment mutation etc. and the crossover and
mutation probabilities can be modified to get better results.
5.3.2 Distributed computing environment: Generally we have to deal
with large databases because on the basis of 100 tuples databases it
is very hard to predict the exact class of data .In practical and real
life application we have several GB of data . To operate this much
of data we need the distributed databases and computing.
5.3.3 Dealing with various platforms and formats: The data may be
various format and databases system so system should be flexible
enough to handle the various formats and DBMSs like (Oracle
,MySql etc).
16
References
Books
B.1 Earl Gose Steve Jost Richard Johnsonbaugh Pattern Recognition
and Image Analysis June, 1996 0132364158 Prentice Hall.
B.2 Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern
classification (2nd edition), Wiley, New York, ISBN 0471056693
B.3 Valluru B. Rao C++ Neural Networks and Fuzzy Logic second
edition.
Research Papers
R.1 Improvements in Genetic AlgorithmsJ. A. Vasconcelos, J. A. Ramírez, R. H. C.
Taka hashi, and R. R. Saldanha . IEEE TRANSACTIONS ON
MAGNETICS, VOL. 37, NO. 5, SEPTEMBER 2001.
17