Professional Documents
Culture Documents
COCHIN – 682022
2010
Seminar Report
On
Soft Computing
Submitted By
Jishin K C
In
Software Engineering
COCHIN – 682022
Certificate
This is to certify that the Seminar report entitled ″Content Based Image
Retrieval ″, submitted by Jishin K C, Semester I, in the partial fulfillment of the requirement for the
award of M.Tech. Degree in Software Engineering is a bonafide record of the Seminar presented by
her in the academic year 2010.
ACKNOWLEDGEMENT
My heartfelt thanks to my guide Dr. Sumam Mary Idicula for taking time
and helping me through my seminar. She has been a constant source of encouragement without which
the seminar might not have been completed on time. I am very grateful for her guidance.
ABSTRACT
Soft Computing differs from conventional (hard) computing in that, unlike hard
computing, it is tolerant of imprecision, uncertainty, partial truth, and approximation. In
effect, the role model for soft computing is the human mind. Principal constituents of Soft
Computing are Neural Networks, Fuzzy Logic, Evolutionary Computation, Swarm
Intelligence and Bayesian Networks. The successful applications of soft computing suggest
that the impact of soft computing will be felt increasingly in coming years. Soft computing is
likely to play an important role in science and engineering, but eventually its influence may
extend much farther.
CONTENTS
• Introduction…………………………………………………...7
• Neural networks……………………………………………....8
• Learning……………………………………………….10
• Evolutionary Computation……………………………………13
• General framework………………………………….....13
• Genetic algorithms………………………………….....13
• Case study……………………………………………..15
• Fuzzy System…………………………………………………22
• Fuzzy sets………………………………………………22
5. Bayesian Network…………………………………………....28
6. Swarm Intelligence……….……………………………….....29
6.2 Applications………………………………………....30
Department of Computer Science 5 CUSAT
Soft Computing
9. Conclusion………………………………………………….32
10. References…………………………………………………33
1. INTRODUCTION
Soft Computing became a formal Computer Science area of study in the early
1990's.Earlier computational approaches could model and precisely analyze only relatively
simple systems. More complex systems arising in biology, medicine, the humanities,
management sciences, and similar fields often remained intractable to conventional
mathematical and analytical methods. That said, it should be pointed out that simplicity and
complexity of systems are relative, and many conventional mathematical models have been
both challenging and very productive. Soft computing deals with imprecision, uncertainty,
partial truth, and approximation to achieve tractability, robustness and low solution cost.
The idea of soft computing was initiated in 1981 by Lotfi. A. Zadeh. Generally
speaking, soft computing techniques resemble biological processes more closely than
traditional techniques, which are largely based on formal logical systems, such as sentential
logic and predicate logic, or rely heavily on computer-aided numerical analysis (as in finite
element analysis). Soft computing techniques are intended to complement each other.
Unlike hard computing schemes, which strive for exactness and full truth, soft
computing techniques exploit the given tolerance of imprecision, partial truth, and
uncertainty for a particular problem. Another common contrast comes from the observation
that inductive reasoning plays a larger role in soft computing than in hard computing.
Components of soft computing include: Neural Network, Perceptron, Fuzzy Systems,
Baysian Network, Swarm Intelligence and Evolutionary Computation.
The highly parallel processing and layered neuronal morphology with learning
abilities of the human cognitive faculty ~the brain~ provides us with a new tool for designing
a cognitive machine that can learn and recognize complicated patterns like human faces and
Japanese characters. The theory of fuzzy logic, the basis for soft computing, provides
mathematical power for the emulation of the higher-order cognitive functions ~the thought
and perception processes. A marriage between these evolving disciplines, such as neural
computing, genetic algorithms and fuzzy logic, may provide a new class of computing
systems ~neural-fuzzy systems ~ for the emulation of higher-order cognitive power.
2. NEURAL NETWORKS
Neural Networks, which are simplified models of the biological neuron system, is a
massively parallel distributed processing system made up of highly interconnected neural
computing elements that have the ability to learn and thereby acquire knowledge and making
it available for use. It resembles the brain in two respects:
- Knowledge is acquired by the network through a learning process.
-Interconnection strengths known as synaptic weights are used to store the knowledge.
A neuron is composed of nucleus- a cell body known as soma. Attached to the soma
are long irregularly shaped filaments called dendrites. The dendrites behave as input
channels, all inputs from other neurons arrive through dendrites. Another link to soma called
Axon is electrically active and serves as an output channel. If the cumulative inputs received
by the soma raise internal electric potential of the cell known as membrane potential, then the
neuron fires by propagating the action potential down the axon to excite or inhibit other
neurons. The axon terminates in a specialized contact called synapse that connects the axon
with the dendrite links of another neuron.
Here x1, x2, x3 … xn are the n inputs to the artificial neuron. w1, w2, w3… wn are the weights
attached to the input links. Total input I received by the soma of the artificial neuron is
5) The NNs can process information in parallel, at high speed, and in a distributed
manner.
2.4 LEARNING
Basically, learning is a process by which the free parameters (i.e., synaptic weights
and bias levels) of a neural network are adapted through a continuing process of stimulation
by the environment in which the network is embedded. The type of learning is determined by
the manner in which the parameter changes take place. Specifically, learning machines may
be classified as follows:
- Learning with a teacher, also referred to as supervised learning
- Learning without a teacher
This second class of learning machines may also be subdivided into
- Reinforcement learning
- Unsupervised learning or self-organizing learning
A.Minutiae Extraction
The input image is divided into equal sized blocks. Each block is processed
independently. The ridges in each block are thinned and the resulting skeleton image is then
used for feature extraction.
reads out the top-down expectation to F1, where the winner is compared with the input
vector. The vigilance parameter determines the mismatch that is to be tolerated when
assigning each host to a cluster. If the match between the winner and the input vector is
within the tolerance, the top-down weights corresponding to the winner are modified. If a
mismatch occurs, F1 layer sends a reset burst to F2, which shuts off the current node, and
chooses another uncommitted node. Once the network stabilizes, the top-down weights
corresponding to each node in the F2 layer represent the prototype vector for that node. Our
architecture of ART1 based network for clustering Fingerprints (illustrated in Figure 2)
consists of 2 nodes in the F1 layer, with each node presented with the binary value 0 or 1.
The pattern vector PH, which represents the X, Y locations of minutiae point is presented at
the F1 layer. The F2 layer consists of a variable number of nodes corresponding to the
number of clusters.
We measure the accuracy of our classification approach by generating graphical point
on a XY plane of a particular color, for minutiae point belonging to a particular cluster i.e.
specific color for specific cluster. When we connect points of same color with a curve line to
represent a cluster, we found that the clusters formed by ART1 are displayed isolated. So we
observed that ART1 produces high quality clustering. Its graphical representation is given
below:
Performance of ART1 Clustering Technique
ART1 algorithm classifies minutiae points of a fingerprint in to number of clusters. To
measure the quality of clusters obtained, we graphically represent minutiae points belong to
different cluster in different color.
Figure 3: Three clusters created by ART1
In recent years, there has been considerable research in exploring novel methods and
techniques have been used for classification of fingerprints. We use the ART1 clustering
algorithm to classify the minutiae points of a fingerprint. Cluster information generated by
ART1 Neural Network can be used in multilevel classification of fingerprints.
3. EVOLUTIONARY COMPUTATION
The most popular evolutionary algorithm is the genetic algorithm of J. Holland .GA is
an iterative procedure and its starting condition is a large set of random strings called genes
or the genome. The genes are linear arrays of numbers. Each number represents an aspect of
the system to which the algorithm is being applied; for example, if we are dealing with neural
network topology, then one of the numbers could represent the number of layers in a
particular network.
When the algorithm is run, each of the networks represented by the population of
genes is tested and graded according to its performance. The genes are then copied with a
probability that is larger if their performance was greater. That is, the genes that produce a
network with poor performance are less likely to get copied; those with good performance are
more likely to get copied. The copying process therefore produces a population that has a
large number of better performing genes.
Having completed the copying process, the genes are then "bred" by crossing over
some of the numbers in the arrays at random points. A small random change called a
"mutation" is introduced to add some diversity. The whole process is then repeated. Each
genome or gene encodes a possible solution in a given problem space-referred to as the
search space. This space comprises all possible solutions to the problem at hand. The symbol
alphabet used is often binary but this has been extended in recent years to include character
based encodings, real-valued encodings, and tree representations.
The following steps are required to define a basic GA:
1. Create a population of random individuals (e.g., a mapping from the set of parameter
values into the set of (0-1) such that each individual represents a possible solution to the
problem at hand.
2. Compute each individual’s fitness, i.e., its ability to solve a given problem. This involves
finding a mapping from bit strings into the reals, the so-called fitness function.
3. Select individual population members to be parents. One of the simplest selection
procedures is the fitness-proportionate selection, where individuals are selected with a
probability proportional to their relative fitness. This ensures that the expected number of
times an individual is chosen is approximately proportional to its relative performance in the
population. Thus, high-fitness individuals stand a better chance of "reproducing," while low-
fitness ones are more likely to disappear.
4. Produce children by recombining patent material via crossover and mutation and add them
to the population.
5. Evaluate the children's fitness.
6. Repeat step (3) to (5) until a solution with required fitness criteria is obtained.
Selection alone cannot introduce any new individuals into the population. Genetically
inspired operators such as crossover and mutation are used to find new points in the search
space. The most important genetic operator is the crossover operator. As in biological
systems, the crossover process yields recombination of alleles via exchange of segments
between pairs of genotypes. Genetic algorithms are stochastic iterative processes that are not
guaranteed to converge. The termination condition may be specified either as some fixed,
maximal number of generations, or on reaching a predefined acceptable fitness level.
The New Student Allocation Problem (NSAP) is one of clustering problems that
allocates students into some classes with minimum intelligence gap in each class and the
number of students in each class does not exceed its maximum capacity. This topic is
essential, because it is very difficult to give good educational service for large number of
students with high diversity of achievements or skills. With the students allocated into the
groups, discriminating policies to these groups can be implemented easily.
It is difficult to measure students’ intelligence. University or school only have very
limited time to know their real intelligence level and the educational process should be held
at a short time after new student registration period. It is fairly acceptable if their intelligence
measured by their admission test score but then they will usually be allocated into classes
with sorting method: a group of new students who has similar ranking and assigns into the
same class. Although this method has no appropriate concept of students’ similarity, many
educational institutions still utilize it. For example, there is a case in California USA which
applied this method. The case appeared after a research resulted on how similar students get
different results in hundreds of school in this area. Since students with same ranking assumed
as similar, it is very reasonable that they get different results.
Same total scores do not always show students’ similarity, because same total scores
can be assembled from various score combinations. According to clustering method, among
the various scores combination, the score combination with completely similar components
shows high degree of students’ similarity. For example, if we apply only two subjects as
grouping criteria, student A with 100 in Biology and 50 in Mathematics must be similar to
the student B with Biology score is approximately 100 and Mathematics score is
approximately 50. If one of those combinations puts in reverse, so that student A with 100 in
Biology and 50 in Mathematics can be considered similar to student C with 50 in Biology and
100 in Mathematics, it definitely doesn’t make sense, although their total score is the same.
Student C should be separated in different class with student A or student B. If there is
student D with 90 in Biology and 45 in Mathematics, then he/she can be grouped in a same
class with student A or student B. It is acceptable since student D is more similar to student A
or to student B rather than to student C. On the contrary, in traditional sorting method,
student C is not only similar to student A or student B, but he/she is also exactly the same
with student A and student B, because their total scores are the same. This is inappropriate
concept of similarity. Hence, NSAP should be solved by clustering method with a right
concept of objects similarity.
There are loads of clustering methods which have been developed in various fields
but none of them can be applied to solve NSAP due to classroom capacity. However there is
no clustering method which predefines the number of each cluster. In NSAP, the number of
objects (students) in each cluster (class) depends on the classroom size. It cannot pursue the
result of clustering process. The clustering methods should be modified to solve NSAP. It is
common in clustering area, because clustering is a subjective process. This subjectivity
makes the process of clustering becomes difficult, because a single algorithm or approach is
not adequate to solve another clustering problem.
Bhuyan et al. had applied GAs to solve general clustering problem . They clustered
several objects so that each cluster resulted in high dissimilarities with other clusters and in
each cluster contained similar objects. They suggested chromosomal representations called
order-based representation. Their idea about chromosomal representation is very good, but it
is not enough to represent classroom capacity. However, it inspires us to modify their work as
one of our proposed approach to solve NSAP.
According to the general clustering problem, finding solution of NSAP is a hard
process because of the large number of ways available. To cluster n objects into nc clusters,
the number of ways is
Let us consider this simple illustration. There are 15 new students that should be allocated to
three classes with same capacity. They will be clustered based on two variables: Mathematics
and Biology scores. Assume that the distribution of their scores in a Cartesian coordinate
system is shown in Figure. Although the number of ways is equals to 210,766,920, the
solution will be established easily by examining the distribution. The solutions are class A
consists of students with index number 3, 5, 8, 13 and 15, class B consists of students with
index number 4, 7, 9, 11 and 14, and class C consists of students with index number 1, 2, 6,
10 and 12.
optimization problems, included NP-hard problem. Using conventional methods to solve NP-
Hard problem requires an exceptionally lot of time, but GAs can solve it in a feasible time.
GAs have been proved powerful for solving optimization problems in many current empirical
works and theories. They particularly well suited to solve hard and multidimensional
problems with numerous difficulties to find optima. They search the solution based on the
mechanism of natural selection and natural genetics.
Centre Based Approach (CBA)
The chromosomal representation of CBA is binary representation. To cluster n new
students into nc classes, chromosomal representation designed as followed:
(1) A chromosome consists of c sub-chromosome. The ith sub-chromosome is representation
of a student as the centre of ith class and it consists of m genes or bits.
(2) A student should be a member of a class, but he/she probably becomes the centre of two
or more classes.
The chromosomal representation is shown in Figure
(2) If there is no integer m so that the number of new students equals to 2m, then the width of
sub-chromosome is the lowest m which 2m > n. The index of student as the centre of the ith
class is
The centre of classes should be converted to the distribution of new students with algorithms.
After converting the centre to the distribution, the value of Equation (3) can be calculated for
evaluation.
Experimentation parameters
To evaluate the proposed approaches, we developed each proposed approaches as a
software which capable to allocate n students into nc classes based on m subjects. Then we
experimented them with a random data score consists of 200 students in four subjects. The
scores are integers between 0 and 100. The students will be allocated into five classes with
same capacities. We use Islamic University of Indonesia as the model on the number of
students, classes and capacities on each class. In this case, the largest intelligence gap of the
random data equals to 75.56. Among the students, there are 9 pairs of students who have
same scores.
Experiments have been performed for each combination of the following parameters:
(1) Crossover probability: 0.25 and 0.50.
(2) Mutation probability: 0.01, 0.05 and 0.1.
(3) Population size: 50, 100 and 200
Here, we follow the suggested parameters of GAs. Experiments with PBA have been
performed for all combination of the crossover and the mutation methods. Experiments with
CBA have been performed for a combination of crossover and mutation .The algorithms run
on a Pentium Dual-Cores notebook with 1.73 GHz. We stop our algorithms after exactly
1000 generations for PBA and 250 generations for CBA. The CPU in PBA spends less than 5
minutes and in CBA spends less than 7 minutes.
We also run the algorithm of PBA for 2000 generations to give more chance to this
approach. In these additional generations, CPU spends about 9 minutes. But there is no
improvement compared to the previous experiments.
The largest intelligence gap of all classes shows that both approaches can split the students
with the largest intelligence gap into different classes. It indicates with none of the classes has
the largest intelligence gap that equals to the largest intelligence gap of the input data.
However, PBA cannot group each pair of students with same scores into the same classes.
There are only four pairs of them allocated into same classes, although it was examined with
the additional generations. On the other hand, each pair of the students is allocated into the
same class by CBA. Although CBA needs more computational times than PBA, but it needs
less number of generations to reach the best known solution. CBA needs only 236
generations to find the best known solution. Experiment with the additional generations
shows that PBA needs tentative number of generations to reach it.
Experiments showed that CBA needed more time than PBA. This result is acceptable,
because CBA had to decode chromosomes as binary numbers into decimal numbers and then
converted them into the solutions. It did not happen to PBA. In addition, CBA is dependent
on the number of classes, while PBA is independent to it. It indicates that time complexity of
PBA is simpler than time complexity of CBA, but the experiment results showed that best
known result of PBA was not as small as the objective function in the first generation. It
indicates that PBA trapped into the local optima. Additional generation also gave no
improvement to the result by PBA. Both combinations of parameters and variations of
crossover and mutation methods gave no effect to the results. It seems that searching process
by PBA is similar to blind search. It is shown that there is no effect on additional population
size. Logically, larger population size will give more probability to get a better population
initialization (1st generation) similar to the running result of CBA.
We should analyze the searching space of PBA to know why this approach is trapped
to the local optima. The searching space of this approach with order-based representation for
the chromosome is much greater than the real problem. The searching space of PBA only
depends on the number of students and it equals to the factorial of the number of students.
For 200 students, it equals to 200! but the total number of ways to cluster 200 students into
five classes with same capacities only equals to 200!/(40!)5. There are a huge number of gene
combinations in one partition which represents a same class; those too many different
chromosomes represent the same solution. Manipulation by GAs operators only produces
different chromosomes, but they represent same solution or old solution which had been
created previously. In other words, the GAs operators cannot improve the fitness function in
each generation. This is the principle of GAs. Since it cannot be attained, it clearly shows that
the approach cannot give a good solution.
The comparison of PBA and CBA shows that CBA is better than PBA. With binary
chromosomal representation in CBA, GAs operators can effectively manipulate
chromosomes. The operators can easily produce new different solutions. It gives more
possibility for GAs to explore the searching space. Finally, the potential solution can be
found in a reasonable time. CBA also reduces the searching space as it equals to two powered
by the width of chromosome. With the case of 200 students and five classes, it merely equals
to 240. It is relatively small compared to the searching space of real problem (200!/(40!)5).
According to the largest intelligence gap in each class, we conclude that CBA can produce
classes with intelligence gap as minimum as possible. It means that more students with
similar intelligence level are allocated in a same class.
We have proposed two approaches for GAs to solve NSAP. Using PBA to solve
NSAP make GAs lost its ability. NSAP is a hard problem for PBA. Although the time
complexity of PBA is simpler than CBA, experiments proved that this approach did not have
ability to solve the problem. This approach makes GAs trapped into local optima, because the
operators do not effectively produce a better population in the next generation. It is caused by
total number of chromosome that is generated from GAs is much larger than total number of
ways to cluster n students into nc classes. On the contrary, CBA succeeds to solve NSAP. In
this approach, chromosomes represent students as the centre of each class. It can minimize
the largest intelligence gap in each class and it can also reduce the searching space. Since we
only use one combination of GAs operators with CBA, the following researches should try to
improve the ability of GAs by more than one combination of operators. They can also hybrid
GAs with another Artificial Intelligence approach to improve its ability.
4. FUZZY SYSTEMS
Fuzzy logic is widely used in machine control. The term itself inspires a certain
skepticism, sounding equivalent to "half-baked logic" or "bogus logic", but the "fuzzy" part
does not refer to a lack of rigour in the method, rather to the fact that the logic involved can
deal with fuzzy concepts—concepts that cannot be expressed as "true" or "false" but rather as
"partially true". Although genetic algorithms and neural networks can perform just as well as
fuzzy logic in many cases, fuzzy logic has the advantage that the solution to the problem can
be cast in terms that human operators can understand, so that their experience can be used in
the design of the controller. This makes it easier to mechanize tasks that are already
successfully performed by humans.
The input variables in a fuzzy control system are in general mapped into by sets of
membership functions similar to this, known as "fuzzy sets". The process of converting a
crisp input value to a fuzzy value is called "fuzzification".
A control system may also have various types of switch, or "ON-OFF", inputs along
with its analog inputs, and such switch inputs of course will always have a truth value equal
to either 1 or 0, but the scheme can deal with them as simplified fuzzy functions that happen
to be either one value or another.
Given "mappings" of input variables into membership functions and truth values, the
microcontroller then makes decisions for what action to take based on a set of "rules", each of
the form:
In this example, the two input variables are "brake temperature" and "speed" that have
values defined as fuzzy sets. The output variable, "brake pressure", is also defined by a fuzzy
set that can have values like "static", "slightly increased", "slightly decreased", and so on.
This rule by itself is very puzzling since it looks like it could be used without bothering with
fuzzy logic, but remembers the decision is based on a set of rules:
• All the rules that apply are invoked, using the membership functions and truth values
obtained from the inputs, to determine the result of the rule.
• This result in turn will be mapped into a membership function and truth value
controlling the output variable.
• These results are combined to give a specific ("crisp") answer, the actual brake
pressure, a procedure known as "defuzzification".
This combination of fuzzy operations and rule-based "inference" describes a "fuzzy expert
system".
Fuzzy controllers are very simple conceptually. They consist of an input stage, a
processing stage, and an output stage. The input stage maps sensor or other inputs, such as
switches, thumbwheels, and so on, to the appropriate membership functions and truth values.
The processing stage invokes each appropriate rule and generates a result for each, then
combines the results of the rules. Finally, the output stage converts the combined result back
into a specific control output value.
As discussed earlier, the processing stage is based on a collection of logic rules in the form of
IF-THEN statements, where the IF part is called the "antecedent" and the THEN part is called
the "consequent". Typical fuzzy control systems have dozens of rules.
This rule uses the truth value of the "temperature" input, which is some truth value of
"cold", to generate a result in the fuzzy set for the "heater" output, which is some value of
"high". This result is used with the results of other rules to finally generate the crisp
composite output. Obviously, the greater the truth value of "cold", the higher the truth value
of "high", though this does not necessarily mean that the output itself will be set to "high",
since this is only one rule among many. In some cases, the membership functions can be
modified by "hedges" that are equivalent to adjectives. Common hedges include "about",
"near", "close to", "approximately", "very", "slightly", "too", "extremely", and "somewhat".
These operations may have precise definitions, though the definitions can vary considerably
Department of Computer Science 23 CUSAT
Soft Computing
between different implementations. "Very", for one example, squares membership functions;
since the membership values are always less than 1, this narrows the membership function.
"Extremely" cubes the values to give greater narrowing, while "somewhat" broadens the
function by taking the square root.
In practice, the fuzzy rule sets usually have several antecedents that are combined
using fuzzy operators, such as AND, OR, and NOT, though again the definitions tend to vary:
AND, in one popular definition, simply uses the minimum weight of all the antecedents,
while OR uses the maximum value. There is also a NOT operator that subtracts a
membership function from 1 to give the "complementary" function.
There are several different ways to define the result of a rule, but one of the most
common and simplest is the "max-min" inference method, in which the output membership
function is given the truth value generated by the premise.
The "centroid" method is very popular, in which the "center of mass" of the result
provides the crisp value. Another approach is the "height" method, which takes the value of
the biggest contributor. The centroid method favors the rule with the output of greatest area,
while the height method obviously favors the rule with the greatest output value.
Fuzzy control system design is based on empirical methods, basically a methodical approach
to trial-and-error. The general process is as follows:
Out of the various important metrics for quality prediction of software, Inspection
metrics, inspection rate and error density are the most important metrics for industrial
applications. Ebenau was the first who employed inspection metrics to identify modules that
are likely to be error prone. Software inspection is considered as an essential practice to
develop high quality software. If it is possible to identify potentially error prone modules
with relatively high degree of accuracy at little or no extra cost by analyzing the present
inspection data, project managers can use such findings to optimize software development
productivity. They may test potentially erroneous modules more extensively than others. Past
researchers focused their attention on empirically validating cost effectiveness of inspection
methods. Barnard and Price identified 9 key metrics used in planning, monitoring, controlling
and improving inspection processes. Sun Sup So proposed in his paper a fuzzy logic based
approach to predict error prone modules.
In order to develop software quality assessment model, one must first identify factors
that strongly influence software quality and the number of residual errors. Unfortunately, it is
extremely difficult, to accurately identify relevant quality factors. Furthermore, the degree of
influence is imprecise in nature. That is, although exact and discrete metric data are used,
inference rules used may be fuzzy in nature. Suppose, for example, that an inspection team
reported an inspection rate of over 380 LoC/h whereas typical inspection rate ranges from
150 to 200 LoC/h [22]. One can convincingly argue that such inspection rate significantly
exceeds the reported average from industrial applications, and experts will most likely agree
unanimously with the conclusion. However, such assessment is fuzzy because the term
“significantly” cannot be precisely quantified. Moreover, if a team reports inspection rate of
275 LoC/h, experts are likely to differ in their opinions as to whether or not the inspection
rate exceeded the industrial norm and by how much it exceeded. In other words, decision
boundary is not well defined. Due to its natural ability to model imprecise and fuzzy aspect of
data and rules, fuzzy logic is an attractive alternative in situations where approximate
reasoning is called for. A prototype system can also be developed based solely on domain
knowledge without relying on extensive training data. Furthermore, performance of the
system can be gradually tuned as more data become available.
A fuzzy logic-based prediction model is proposed as follows:
Let
I= Inspection rate.
E=Error density.
Complexity and Co-efficient matrix for Inspection Rate and Error Density are given as:
The complexity attributes low, medium and high of the two metrics inspection rate
and error density are taken as triangular fuzzy numbers, for I ≥ 100 and 0 ≤ E ≤ 35. Ri for
I<100 is obtained directly from the complexity table no.1, while Ri for I ≥ 100 is obtained by
the process of fuzzification and defuzzification. Re for E>35 is obtained directly from the
complexity table no.2, while Re for 0 ≤ E ≤ 35 is obtained by the process of fuzzification and
defuzzification. The pictorial representation is given in Figures.
Fuzzy Pictorial representation of I
Defuzzification:
We define:
The proposed study proposes a fuzzy logic based precise approach to quantify quality
of software. Software can be given quality grades on the basis of two metrics inspection rate/
hr and error density. The prediction of quality of software is very easy by this approach.
Precise quality grades can be given to any software. Software are graded on the quality grade
basis in 10 grades. Modules having grade 1 are supposed to be most error prone, while those
having quality grade 10 are considered satisfactory on the basis of quality. Triangular Fuzzy
numbers have been used for inspection rate and error/kLOC. The methodology of fuzzy logic
used for, in the proposed study, is sufficiently general and can be applied to other areas of
quantitative software engineering.
5. BAYESIAN NETWORK
Formally, Bayesian networks are directed acyclic graphs whose nodes represent
random variables in the Bayesian sense: they may be observable quantities, latent variables,
unknown parameters or hypotheses. Edges represent conditional dependencies; nodes which
are not connected represent variables which are conditionally independent of each other.
Each node is associated with a probability function that takes as input a particular set of
values for the node's parent variables and gives the probability of the variable represented by
the node. For example, if the parents are m Boolean variables then the probability function
could be represented by a table of 2m entries, one entry for each of the 2m possible
combinations of its parents being true or false.
Efficient algorithms exist that perform inference and learning in Bayesian networks.
Bayesian networks that model sequences of variables (e.g. speech signals or protein
sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that
can represent and solve decision problems under uncertainty are called influence diagrams.
Bayesian networks are used for modelling knowledge in computational biology and
bioinformatics (gene regulatory networks, protein structure and gene expression analysis),
medicine, document classification, information retrieval, image processing, data fusion,
decision support systems, engineering, gaming and law.
6. SWARM INTELLIGENCE
The application of swarm principles to robots is called swarm robotics, while 'swarm
intelligence' refers to the more general set of algorithms. 'Swarm prediction' has been used in
the context of forecasting problems.
Particle swarm optimization (PSO) is a global optimization algorithm for dealing with
problems in which a best solution can be represented as a point or surface in an n-
dimensional space. Hypotheses are plotted in this space and seeded with an initial velocity, as
well as a communication channel between the particles. Particles then move through the
solution space, and are evaluated according to some fitness criterion after each time step.
Over time, particles are accelerated towards those particles within their communication
grouping which have better fitness values. The main advantage of such an approach over
other global minimization strategies such as simulated annealing is that the large numbers of
members that make up the particle swarm make the technique impressively resilient to the
problem of local minima.
Gravitational search algorithm (GSA) is constructed based on the law of Gravity and
the notion of mass interactions. The GSA algorithm uses the theory of Newtonian physics
and its searcher agents are the collection of masses. In GSA, we have an isolated system of
masses. Using the gravitational force, every mass in the system can see the situation of other
masses. The gravitational force is therefore a way of transferring information between
different masses.
6.2 APPLICATIONS
7. CONCLUSION
The complementarity of FL, NC, GC, and PR has an important consequence: in many
cases a problem can be solved most effectively by using FL, NC, GC and PR in combination
rather than exclusively. A striking example of a particularly effective combination is what has
come to be known as "neurofuzzy systems." Such systems are becoming increasingly visible
as consumer products ranging from air conditioners and washing machines to photocopiers
and camcorders. Less visible but perhaps even more important are neurofuzzy systems in
industrial applications. What is particularly significant is that in both consumer products and
industrial systems, the employment of soft computing techniques leads to systems which
have high MIQ (Machine Intelligence Quotient). In large measure, it is the high MIQ of SC-
based systems that accounts for the rapid growth in the number and variety of applications of
soft computing.
The successful applications of soft computing suggest that the impact of soft
computing will be felt increasingly in coming years. Soft computing is likely to play an
especially important role in science and engineering, but eventually its influence may extend
much farther. In many ways, soft computing represents a significant paradigm shift in the
aims of computing - a shift which reflects the fact that the human mind, unlike present day
computers, possesses a remarkable ability to store and process information which is
pervasively imprecise, uncertain and lacking in categoricity.
8. REFERENCE
1. Zadeh, Lotfi A., "Fuzzy Logic, Neural Networks, and Soft Computing," Communications
of the ACM, March 1994, Vol. 37 No. 3, pages 77-84.
2. Harish Mittal, Pradeep Bhatia and Puneet Goswami, “Software Quality Assessment
Based on Fuzzy Logic Technique”, International Journal of Soft Computing Applications
ISSN: 1453-2277 Issue 3 (2008), pp.105-112
3. Zainudin Zukhri and Khairuddin Omar,” Solving New Student Allocation Problem with
Genetic Algorithms”, International Journal of Soft Computing Applications ISSN:
1453-2277 Issue 3 (2008), pp.6-15
5. NARESH K. SINHA and MADAN M. GUPTA, “Soft computing and intelligent systems –
Theory & Applications”