You are on page 1of 16

National Institute of Business Management

Chennai - 020
EMBA/ MBA

Elective: Artificial Intelligence (Part II)

1.Explain the concept of Heuristic search techniques.

Ans:- A Heuristic is a technique to solve a problem faster than classic methods, or to find an
approximate solution when classic methods cannot. This is a kind of a shortcut as we often trade
one of optimality, completeness, accuracy, or precision for speed. A Heuristic (or a heuristic
function) takes a look at search algorithms. At each branching step, it evaluates the available
information and makes a decision on which branch to follow. It does so by ranking alternatives.
The Heuristic is any device that is often effective but will not guarantee work in every case.

Heuristic Search Techniques in Artificial Intelligence

Briefly, we can taxonomize such techniques of Heuristic into two categories:


a. Direct Heuristic Search Techniques in AI

Other names for these are Blind Search, Uninformed Search, and Blind Control Strategy. These
aren’t always possible since they demand much time or memory. They search the entire state
space for a solution and use an arbitrary ordering of operations. Examples of these are Breadth
First Search (BFS) and Depth First Search (DFS).

b. Weak Heuristic Search Techniques in AI

Other names for these are Informed Search, Heuristic Search, and Heuristic Control Strategy.
These are effective if applied correctly to the right types of tasks and usually demand domain-
specific information. We need this extra information to compute preference among child nodes
to explore and expand. Each node has a heuristic function associated with it. Examples are Best
First Search(BFS) and A*.

Before we move on to describe certain techniques, let’s first take a look at the ones we generally
observe. Below, we name a few.
 Best-First Search
 A* Search
 Bidirectional Search
 Tabu Search
 Beam Search
 Simulated Annealing
 Hill Climbing
 Constraint Satisfaction Problems

All of the search methods in the preceding section are uninformed in that they did not take into
account the goal. They do not use any information about where they are trying to get to unless
they happen to stumble on a goal. One form of heuristic information about which nodes seem the
most promising is a heuristic function h(n), which takes a node n and returns a non-negative real
number that is an estimate of the path cost from node n to a goal node. The function h(n) is
an underestimate if h(n) is less than or equal to the actual cost of a lowest-cost path from
node n to a goal.

The heuristic function is a way to inform the search about the direction to a goal. It provides an
informed way to guess which neighbor of a node will lead to a goal.
There is nothing magical about a heuristic function. It must use only information that can be
readily obtained about a node. Typically a trade-off exists between the amount of work it takes to
derive a heuristic value for a node and how accurately the heuristic value of a node measures the
actual path cost from the node to a goal.

A standard way to derive a heuristic function is to solve a simpler problem and to use the actual
cost in the simplified problem as the heuristic function of the original problem. The h function
can be extended to be applicable to (non-empty) paths. The heuristic value of a path is the
heuristic value of the node at the end of the path. That is:
h(⟨no,...,nk⟩)=h(nk)

A simple use of a heuristic function is to order the neighbors that are added to the stack
representing the frontier in depth-first search. The neighbors can be added to the frontier so that
the best neighbor is selected first. This is known as heuristic depth-first search. This search
chooses the locally best path, but it explores all paths from the selected path before it selects
another path. Although it is often used, it suffers from the problems of depth-fist search.
Another way to use a heuristic function is to always select a path on the frontier with the lowest
heuristic value. This is called best-first search. It usually does not work very well; it can follow
paths that look promising because they are close to the goal, but the costs of the paths may keep
increasing.
Heuristic procedure, or heuristic, is defined as having the following properties.

1. It will usually find good, although not necessary optimum solutions.


2. It is faster and easier to implement than any known exact algorithm ( one which guarantees an
optimum solution ).
In general, heuristic search improve the quality of the path that are exported. Using good
heuristics we can hope to get good solutions to hard problems such as the traveling salesman
problem in less than exponential time. There are some good general purpose heuristics that are
useful in a wide variety of problems. It is also possible to construct special purpose heuristics to
solve particular problems. For example, consider the traveling salesman problem.

2.Explain the major drawbacks of the alpha-beta procedure.


Ans:- Alpha Beta Pruning is all about reducing the size (pruning) of our search tree. While a
brute-force approach is an easier approach to use, it doesn’t necessarily mean it is the most
optimal approach. Many times, one doesn’t need to visit all possible branches to come up with
the best possible solution in hand.

Thus we need to provide the min-max algorithm with some stopping criteria using which it
would stop searching a region of the tree once it finds the guaranteed minimum or maximum at
that level. This would prevent the algorithm from using additional computational time, making it
much more responsive and fast.

The original min-max algorithm performs traversals of the tree in a left to right fashion while
also going to the deepest possible depth of the tree. This essentially is a depth-first approach. It
then discovers values that must be assigned to nodes directly above it, without ever looking at
other branches of the tree.

Thus, the addition of the stopping condition makes min-max take decisions like it used to
previously but it optimizes the performance aspect of the algorithm.

In the image below, we have a tree with various scores assigned to each node. Some nodes are
shaded in red, indicating there’s no need to review them.

At the bottom left of the tree, minimax goes through the values 5 and 6 on the bottom max level.
It determines that 5 must be assigned to the min level right above it.

But, after looking at 7 and 4 of the right max level branch, it realizes that the above min level
node must be assigned a maximum value of 4. Since the second max level right above the first
min level will take the maximum between 5 and at most 4, it’s clear that it’ll choose 5. Following
this, it would continue traversing the tree to perform the same exact set of operations within the
tree’s other branches.

Move Ordering in Alpha-Beta pruning:

The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is
examined. Move order is an important aspect of alpha-beta pruning.

It can be of two types:

o Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes
more time because of alpha-beta factors, such a move of pruning is called worst ordering.
In this case, the best move occurs on the right side of the tree. The time complexity for
such an order is O(bm).
o Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence
it first search left of the tree and go deep twice as minimax algorithm in the same amount
of time. Complexity in ideal ordering is O(bm/2).

Advantages:

1. It reduces branching factor from b to square root of b.

2. As there are lesser function calls than Minimax algorithm, it has reduced space complexity

3. This approach doesn‘t search and evaluate unnecessary nodes of game search tree.

Disadvantages : Using Alpha-Beta pruning has a number of advantages to it in terms of


space/time complexity gains over the original minimax algorithm. So we are probably
wondering if this is the best that can be done. In fact it may not be. In addition it does not solve
all of the problems associated with the original minimax algorithm. Below is a list of some
disadvantages and some suggestions for better ways to achieve the goals of choosing the best
move.

1.Evaluations of the utility of a node are usually not exact but crude estimates of the value of a
position and as a result large errors could be associated with them.

2..It is in most cases not feasible to search the entire game tree, a depth limit needs to be set. A
notable example is Go which has a branching factor of 360! even with Alpha - Beta Pruning one
can not look ahead more than 3 or 4 moves in the game tree.

3.Alpha-Beta is designed to select a good move but it also calculates the values of all legal
moves. A better method maybe to use what is called the utility of a node expansion. In this way a
good search algorithm could select a node that had a high utility to expand (these will hopefully
lead to better moves). This could lead to a faster decision by searching through a smaller
decision space. A extension on those abilities would be the use of another technique called goal-
directed reasoning. This technique focuses on having a certain goal in mind like capturing the
queen in chess. So far no one has successfully combined these techniques into a fully functional
system.

4. This approach doesn‘t suggest to maintain dictionary or look up table for starting moves.
5. It is slower than Deep Blue as it doesn‘t use observed strategically proven moves from
grandmasters‘ game move database.

3.What are the fundamental characteristics of an expert system? Explain.

Ans:- Expert System


An Expert System is defined as an interactive and reliable computer-based decision-making
system which uses both facts and heuristics to solve complex decision-making problems. It is
considered at the highest level of human intelligence and expertise. It is a computer application
which solves the most complex issues in a specific domain.

The expert system can resolve many issues which generally would require a human expert. It is
based on knowledge acquired from an expert. It is also capable of expressing and reasoning
about some domain of knowledge. Expert systems were the predecessor of the current day
artificial intelligence, deep learning and machine learning systems.

Components of the expert system

The expert System consists of the following given components:


User Interface

The user interface is the most crucial part of the expert system. This component takes the user's
query in a readable form and passes it to the inference engine. After that, it displays the results to
the user. In other words, it's an interface that helps the user communicate with the expert system.

Inference Engine

The inference engine is the brain of the expert system. Inference engine contains rules to solve a
specific problem. It refers the knowledge from the Knowledge Base. It selects facts and rules to
apply when trying to answer the user's query. It provides reasoning about the information in the
knowledge base. It also helps in deducting the problem to find the solution. This component is
also helpful for formulating conclusions.

Knowledge Base

The knowledge base is a repository of facts. It stores all the knowledge about the problem
domain. It is like a large container of knowledge which is obtained from different experts of a
specific field.

Thus we can say that the success of the Expert System mainly depends on the highly accurate
and precise knowledge.

CHARACTERISTICS OF AN EXPERT SYSTEM

The growth of expert system is expected to continue for several years. With the continuing
growth, many new and exciting applications will emerge. An expert system operates as an
interactive system that responds to questions, asks for clarification, makes recommendations and
generally aids the decision making process. Expert system provides expert advice and guidance
in a wide variety of activities from computer diagnosis to delicate medical surgery.

An expert system is usually designed to have the following general characteristics.

1. High level Performance: The system must be capable of responding at a level of


competency equal to or better than an expert system in the field. The quality of the advice given
by the system should be in a high level integrity and for which the performance ratio should be
also very high.

2. Domain Specificity: Expert systems are typically very domain specific. For ex., a diagnostic
expert system for troubleshooting computers must actually perform all the necessary data
manipulation as a human expert would. The developer of such a system must limit his or her
scope of the system to just what is needed to solve the target problem. Special tools or
programming languages are often needed to accomplish the specific objectives of the system.

3. Good Reliability: The expert system must be as reliable as a human expert.

4. Understandable: The system should be understandable i.e. be able to explain the steps of
reasoning while executing. The expert system should have an explanation capability similar
to the reasoning ability of human experts.

5. Adequate Response time: The system should be designed in such a way that it is able to
perform within a small amount of time, comparable to or better than the time taken by a
human expert to reach at a decision point. An expert system that takes a year to reach a
decision compared to a human expert’s time of one hour would not be useful.

6. Use symbolic representations: Expert system use symbolic representations for knowledge
(rules, networks or frames) and perform their inference through symbolic computations that
closely resemble manipulations of natural language.

7. Linked with Metaknowledge: Expert systems often reason with metaknowledge i.e. they
reason with knowledge about themselves and their own knowledge limits and capabilities.
The use of metaknowledge is quite interactive and simple for various data representations.

8. Expertise knowledge: Real experts not only produce good solutions but also find them
quickly. So, an expert system must be skillful in applying its knowledge to produce solutions
both efficiently and effectively by using the intelligence human experts.

9. Justified Reasoning: This allows the users to ask the expert system to justify the solution or
advice provided by it. Normally, expert systems justify their answers or advice by explaining
their reasoning. If a system is a rule based system, it provides to the user all the rules and
facts it has used to achieve its answer.
10. Explaining capability: Expert systems are capable of explaining how a particular conclusion
was reached and why requested information is needed during a consultation. This is very
important as it gives the user a chance to access and understand the system’s reasoning
ability, thereby improving the user’s confidence in the system.

11. Special Programming Languages: Expert systems are typically written in special
programming languages. The use of languages like LISP and PROLOG in the development
of an expert system simplifies the coding process. The major advantage of these languages,
as compared to conventional programming languages is the simplicity of the addition,
elimination or substitution of new rules and memory management capabilities. Some of the
distinguishing characteristics of programming languages needed for expert system work are
as follows:

1. Efficient mix of integer and real variables.


2.Good memory management procedures.
3.Extensive data manipulation routines.
4.Incremental compilation.
5. Tagged memory architecture
6..Efficient search procedures.
7.Optimization of the systems environment.

4.Explain the various roles of activation functions? Are all activation functions giving the
same output?

Ans:-The usage of neural networks to understand deeper relations of data has coined to Deep
learning. Deep learning also tasted its success using many folds of the neural networks. Today
with their adaptive learning characteristics neural networks are top most models of medical field.
The raise and fall of the stock is even analyzed by a neural network. Neural cryptography shows
the other corner of Neural Nets for developing data security algorithms. Neural key exchange
protocol is widely spread today.

An NN is a network of nodes also called as Neurons. The NN is designed to include various


layers like the input, output layers and the most concentrated hidden layer. Between the I/O
layers the NN can have hidden layers. The neurons of the I/O layers will be connected to the
neurons of the hidden layers. There is no thumb rule on how many hidden layers an NN can have
and literature showed the general NNs use two or three hidden layers. Research has proved that
hidden layers aim at increasing the predictive accuracy of the NN. The usage of more hidden
layers can be seen in deep neural networks, extracting deeper data relationships. Each layer is
connect through of Neurons or nodes of the NN. The input layer neurons take the input values.
The output neurons give the predicted output. The hidden neurons which are in the hidden layer
play a vital role in prediction functionality. The hidden neurons are defined with an activation
function. The activation function is a transfer function, transforming the input to prediction
standards. The predictive capability of the neural network is majorly on the type of activation
function defined in the hidden neurons.

The hidden layer nodes are called as hidden neurons and holding a membership function called
as activation function. The activation function of an hidden neuron plays a major role in network
performance . The activation function is a transformation function, transforming the NN input to
a value that will converge to the target output with less error rate . As we know human brain
receives both relevant and irrelevant information at a time and has the capability of segregating
both, where the irrelevant can be referred as noise. Just like the human brain, the neurons uses
activation function to separate the noise from the input and reduce the error. Without the
activation function the output of the NN is same as the input. Research has explored many
activation functions: both linear and non-linear. Major of the NN define using nonlinear
activation functions as they exhibit non linearity characteristics, taking linear activation functions
the input will be same as output without error being minimized. Many non-linear activation
functions are in usage. Most popularly used is the sigmoid or the logistic functions which is used
for prediction of yes/no cases. The Tanh and the ReLU are other frequently used non-linear
activation functions

Activation Function

It’s just a thing function that you use to get the output of node. It is also known as Transfer
Function.

It is used to determine the output of neural network like yes or no. It maps the resulting values in
between 0 to 1 or -1 to 1 etc. (depending upon the function).

The Activation Functions can be basically divided into 2 types-

1. Linear Activation Function

2. Non-linear Activation Functions


Linear or Identity Activation Function

As we can see the function is a line or linear. Therefore, the output of the functions will not be
confined between any range.

Fig: Linear Activation Function

Equation : f(x) = x

Range : (-infinity to infinity)

It doesn’t help with the complexity or various parameters of usual data that is fed to the neural
networks.
Non-linear Activation Function

The Nonlinear Activation Functions are the most used activation functions. Nonlinearity helps to
makes the graph look something like this

Fig: Non-linear Activation Function

It makes it easy for the model to generalize or adapt with variety of data and to differentiate
between the output.

The main terminologies needed to understand for nonlinear functions are:

Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It is also known as slope.

Monotonic function: A function which is either entirely non-increasing or non-decreasing.


The Nonlinear Activation Functions are mainly divided on the basis of their range or curves-

1. Sigmoid or Logistic Activation Function

The Sigmoid Function curve looks like a S-shape.

Fig: Sigmoid Function

The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it
is especially used for models where we have to predict the probability as an output.Since
probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.

The function is differentiable.That means, we can find the slope of the sigmoid curve at any two
points.

The function is monotonic but function’s derivative is not.

The logistic sigmoid function can cause a neural network to get stuck at the training time.

The softmax function is a more generalized logistic activation function which is used for
multiclass classification.
2. Tanh or hyperbolic tangent Activation Function

tanh is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1). tanh is
also sigmoidal (s - shaped).

Fig: tanh v/s Logistic Sigmoid

The advantage is that the negative inputs will be mapped strongly negative and the zero inputs
will be mapped near zero in the tanh graph.

The function is differentiable.

The function is monotonic while its derivative is not monotonic.

The tanh function is mainly used classification between two classes.

Both tanh and logistic sigmoid activation functions are used in feed-forward nets.
3. ReLU (Rectified Linear Unit) Activation Function

The ReLU is the most used activation function in the world right now.Since, it is used in almost
all the convolutional neural networks or deep learning.

Fig: ReLU v/s Logistic Sigmoid

As you can see, the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and
f(z) is equal to z when z is above or equal to zero.

Range: ( 0 to infinity)

The function and its derivative both are monotonic.

But the issue is that all the negative values become zero immediately which decreases the ability
of the model to fit or train from the data properly. That means any negative input given to the
ReLU activation function turns the value into zero immediately in the graph, which in turns
affects the resulting graph by not mapping the negative values appropriately.

4. Leaky ReLU

It is an attempt to solve the dying ReLU problem


Fig : ReLU v/s Leaky ReLU

The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.

When a is not 0.01 then it is called Randomized ReLU.

Therefore the range of the Leaky ReLU is (-infinity to infinity).

Both Leaky and Randomized ReLU functions are monotonic in nature. Also, their derivatives also
monotonic in nature.

*******

You might also like