Defining AI and exploring key concepts

Defn of AI
According to the father of Artificial Intelligence, John McCarthy, it is

“The science and engineering of making intelligent machines, especially intelligent computer programs”.
Ch-1
- AI is the technology using which we can create intelligent systems that can simulate human behaviour
Turing test in AI
“A test to check whether a machine can think like a human or not”
- Player A is a computer, Player B is human, and Player C is an interrogator. Interrogator is aware that one of
them is machine, but he needs to identify this on the basis of questions and their responses.
- The test result does not depend on each correct answer, but only how close the response is to human answer
and the conversation between all players takes place via keyboard and screen
- In this game, if an interrogator would not be able to identify the computer, then the computer passes the test
successfully, and it is said to be intelligent enough to think like a human.
- The questions and answers can be like:
Interrogator: Are you a computer?
Player A (Computer): No
Interrogator: Multiply two large numbers such as (256896489*456725896)
Player A: Long pause and give the wrong answer.
- Features used by the computer to pass the turing test are :
o Natural language processing : To communicate successfully in English
o Knowledge representation: To store Knowledge from the real world
o Automated reasoning: To make use of the stored info to answer the question and draw conclusion from it.
o Machine learning: To make new predictions by finding patterns
AI Underlying Assumptions
- Newwell and simon presented the physical Symbol hypothesis which lies in the heart of the research of the AI.
- A physical Symbol consists of a set of entities called symbols which can occur as a part of another entity called
symbol structure or expression
- A symbol structure is collection of symbols connected in some physical way.
- A physical symbol system is the machine that produces an evolving collection of symbol structure.
- Physical Symbol system Hypothesis states that it is possible to build programs that can perform intelligent tasks
that are currently performed by the people.
- Example of physical System :
1. A digital computer :
o Symbols are 0’s and 1’s
o Processes are operations of the CPU
2. Chess:
o Symbols are Pieces
o Processes are legal chess move
o Expressions are positions of all pieces on the chess board
3. Formal Logic
o Symbols are Logical operators
o Processes are rules of logical deduction
o Expressions are statements in formal logic that can be true or false
4. Algebra
o Symbols are +,-,x,y,etc
o Processes are the rules of algebra
o Expressions are equations
AI Techniques
There are three important AI techniques:
1. Search
- Provides a way of solving problems for which no direct approach is available.
- It also provides a framework into which any direct techniques that are available can be embedded.
2. Use of knowledge
- Provides a way of solving complex problems by exploiting the structure of the objects that are involved
3. Abstraction
- Provides a way of separating important features and variations from many unimportant ones that would
otherwise overwhelm any process
AI Problems
- Humans learn mundane tasks since their birth.They learn Formal Tasks and Expert Tasks later.
- So much of the early work in the field of AI focused on formal tasks domain, such as game playing and
theorem proving and less in the mundane task domain.
- As AI research progressed, techniques for handling large amount of world knowledge were
Developed and now the tasks were focused on perception, natural language understanding and problem solving in
specialized domains.
AI Task Domains
Problems addressed by AI search Algorithms falls into 3 classes

1. Two player game: Tic tac toe, chess
2. Single agent Path finding problems : 8 puzzle problem
3. Constraint Satisfaction problem: N queen, sudoku, map coloring, cryparithmatic
Classification of AI
1. Weak AI: The study and design of machines that perform intelligent tasks.
- Not concerned with how tasks are performed, mostly concerned with performance and efficiency.
Eg : to make a flying machine, use logic and physics, don’t mimic a bird.
2. Strong AI: The study and design of machines that simulate the human mind to performintelligent tasks.
- Borrow many ideas from psychology, neuroscience.
- Goal is to perform tasks the way human might do them.
- Assumes that the physical symbol hypothesis holds true.
3. Evolutionary AI: The study and design of machines that simulate simple creatures
- For example, ants, bees, etc.
ARTIFICIAL INTELLIGENCE MACHINE LEARNING
AI leads to intelligence or wisdom. ML leads to knowledge.
The aim is to increase the chance of success and not The aim is to increase accuracy, but it does not care
accuracy. about the success
The goal is to simulate natural intelligence to solve The goal is to learn from data to maximize
complex problems. the performance the task.
AI has a very broad variety of applications. The scope of machine learning is constrained.
AI is a broader family consisting of ML and DL as its
components. ML is a subset of AI.
It involves developing a system that mimics humans to
solve problems. It involves creating self-learning algorithms.
AI will go for finding the optimal solution. ML will go for a solution whether it is optimal or not
AI can work with structured, semi-structured, and ML can work with only structured and semi-structured
unstructured data. data.
AI’s key uses include- ML’s key uses include-
- Siri, customer service via catboats - Facebook’s automatic friend suggestions
- Expert Systems - Google’s search algorithms
- Machine Translation like Google Translate - Banking fraud analysis
- Intelligent humanoid robots such as Sophia - Stock price forecast
Three broad categories of AI are : Three broad categories of ML are :

1. Artificial Narrow Intelligence (ANI) 1. Supervised Learning
2. Artificial General Intelligence (AGI) 2. Unsupervised Learning
3. Artificial Super Intelligence (ASI) 3. Reinforcement Learning
Applications of AI
1. Gaming
- AI plays vital role in strategic games such as chess, poker, tic-tac-toe, etc.,where machine can think of large
number of possible positions based on heuristic knowledge.
2. Natural Language Processing
- It is possible to interact with the computer that understands natural language spoken by humans.
3. Expert Systems
- There are some applications which integrate machine, software, and special information to provide
explanation and advice to the users.
4. Computer Vision Systems
- These systems understand, interpret, and comprehend visual input on the computer.
5. Speech Recognition
- Some intelligent systems are capable of hearing and comprehending the language in terms of sentences and
their meanings while a human talks to it. It can handle different accents, slang words, noise in the
background, change in human’s noise, etc.
6. Handwriting Recognition
- The handwriting recognition software reads the text written on paper by a pen or on screen by a stylus. It
can recognize the shapes of the letters and convert it into editable text.
7. Intelligent Robots
- Robots are able to perform the tasks given by a human. They have sensors to detect physical data from the
real world such as light, heat, temperature. They have efficient processors and huge memory, to exhibit
intelligence. In addition, they are capable of learning from their mistakes and they can adapt to the new
environment.
Problem Solving
For solving any type of problem in the real world one needs a formal description of theproblem. Ch-2
1. What is the explicit goal of the problem
2. What is the Implicit criteria for success
3. What is the Initial Situation
4. Ability to Perform
- Problem solving is a process of generating solutions from the observed data.
- Problem Solving means Searching for a goal state.
State and State space Representation

- A state is a representation of problem elements at a given moment.
- A state space is the set of all possible states reachable from the initial state.
- A state space forms a graph in which the nodes are states and the arcs between nodes are actions and a
path is a sequence of states connected by a sequence of actions.
Define the problems as a state space search

- To provide a formal description of a problem, we need to do the following:
1. Define a state space that contains all the possible configurations of the relevantobjects.
2. Specify one or more states that describe possible situations, from which the problem solving process
may start. These states are called initial states.
3. Specify one or more states that would be an acceptable solution to the problem.These states are
called goal states
- Specify a set of rules that describe the actions available.
- The problem can then be solved by using the rules, in combination with an appropriate control strategy,
to move through the problem space until a path from an initial state to a goal state is found. This process is
known as ‘search’.
Control Strategies
Control strategies help us decide which rule to apply next during the process of searching for a solution to a
problem.
1. Forward search :Search proceeds forward from the initial state towards a solution (goal).
2. Backward search :Search proceeds backward from a goal state toward either a solvable subproblem or the
initial state.
3. Both forward and backward search :Mixture of both forward and backward search.
4. Systematic search (Blind search OR Uninformed search) : Do not have additional information about states
beyond problem definition. Total search space is looked for the solution. Blind searches are inefficient in most
cases
5. Heuristic Search (Informed search OR Directed search control strategy): Some information about problem space
is used to compute the preference among various possibilities for expansion. It can decide whether one non goal
state is more promising than other non goal state. A heuristic search might not always find the best solution but
it is guaranteed to find a good solution in reasonable time. Informed search methods use problem specific
knowledge so they may be more efficient.
Production System
- Production systems provide appropriate structures for performing and describing search processes.
Components of Production system
1. Global Database (Working memory)

It contains a description of the current state of the world in the problem solving process.
2. Production rules
The productions are rules of the form C -> A, where the LHS is known as the conditionand the RHS is known
as the action.These rules are interpreted as given condition C, take action A.
3. Control system
The control system checks the applicability of a rule.It helps decide which rule should be applied and
terminates the process when the system gives the correct output.
Production System Characteristics

- Simplicity : The production rule in AI is in the form of an ‘IF-THEN’ statement so it helps to represent knowledge
and reasoning in the simplest way possible to solve real-world problems.
- Modularity: Helps adjusting the parameters of the rules.
- Modifiability : Helps alter the rules as per requirements.
- Knowledge-intensive : It contain knowledge in the form of a human spoken language, i.e.,English. It is not built
using any programming languages.
Classes of Production System

1. A monotonic production system is a production system in which the application of a rule never prevents the
later application of another rule that could also have been applied at the time the first rule was selected.
2. A non-monotonic production system is one in which systems does not require backtracking to correct the
previous incorrect moves.
3. A partially commutative production system is one that can give the results even by
interchanging the states of rules.
4. A commutative production system is a production system that is both monotonic and partially commutative.
Issues in the design of search programs

- The direction in which to conduct the search that is either forward or backward reasoning .
- How to select applicable rules for matching that would depend on the data set.
- How to represent each node of the search process (Knowledge representation problem.)
7 problem Characteristics
1. Is the problem decomposable ?
2. Can solution steps be ignored or undone
3. Is the problem universe predictable
4. Is a good solution absolute or relative?
5. Is the solution a state or a path?
6. What is the role of knowledge?
7. Does the task require human interaction?
Generate and Test (British Museum Search Algorithm)
- Generate and Test Search is a heuristic search technique based on Depth First Search with Backtracking which
guarantees to find a solution if done systematically and there exists a solution. It ensures that the best
solution is checked against all possible generated solutions.
Algorithm :
1.Generate a possible solution.
2.Test to see if this is the expected solution.
3.If the solution has been found quit else go to step 1.
Limitations :
- Inefficient for problems with large space.
- Acceptable only for simple problems like 4 Cube problem
Uninformed Search techniques

I. Breadth First Search (BFS)
- In BFS, the newly generated nodes are put at the back of the fringe or the OPEN list.
- Nodes will be expanded in a FIFO (First In First Out) order.
- Fringe : It is a collection of nodes that have been generated but not yet expanded.
o Characteristics of BFS :
1. Complete – If the shallowest node is at finite depth.
2. Optimal – finds a solution with the shortest path length.
3. Time Complexity – O(b^d) d= depth, b=children
4. Space Complexity – O(b^d)
o Advantages: Finds the shortest path to the goal.

o Disadvantages: Requires the generation & storage of a tree whose size isexponential.
II. Depth First search (DFS)

- In DFS, the newly generated nodes are put at the front of the fringe or the OPEN list.
- Nodes will be expanded in a LIFO(Last In First Out) order.
- Fringe : It is a collection of nodes that have been generated but not yet expanded.
o Characteristics of DFS :
1. Complete – If the shallowest node is at finite depth.
2. Not optimal – the number of steps in reaching the solution is more.
3. Time Complexity – O(b^m) m= maximum tree depth, b=children
4. Space Complexity – O(bm)
o Advantages: Less space complexity as compared to than BFS.

o Disadvantages: Time complexity is more. Does not guarantee optimal Solution. Unbounded Tree
Problem.
III. Depth limited search (DFS-L OR DLS)
- DFS-L is the same as DFS but the tree is not explored below some depth limit L.
- The unbounded tree problem that happens r in the DFS algorithm can be fixed in DFS-L by imposing
limit to the depth of the search domain.
- DFS-L terminates under 2 clauses
1. When the goal node is found to exist
2. When there is no solution within the given depth domain
o Characteristics of DFS-L :
1. Complete – If goal node lies above depth limit
2. Optimal – If goal node lies above depth limit
3. Time Complexity – O(b^l) l=depth-limit, b=children
4. Space Complexity – O(bl)
o Advantages : Depth-limited search is Memory efficient.

o Disadvantages : Not Guaranteed that it will give you a solution.
IV. Iterative Deepening Search (IDS) OR (IDDFS - Iterative Deepening Depth firstsearch)
- It is a search algorithm that runs multiple DFS searches with increasing depth limits.
- It is iterative in nature.
- This algorithm performs depth-first search up to a certain "depth limit", and it keeps increasing the
depth limit after each iteration until the goal node is found.
- Useful when the search space is large, and depth of the goal node is unknown.
o Characteristics of IDS :
1. Complete – This algorithm is complete if the branching factor is finite.
2. Optimal – If path cost is a non- decreasing function of the depth of the node.
3. Time Complexity – O(b^d) wh., d=depth, b=children
4. Space Complexity – O(bd)
o Advantages: It combines the benefits of Breadth-first search's fast search and depth-first search's
memory efficiency
o Disadvantages : It repeats all the work of the previous phase.
V. Bi-directional search
- Bidirectional search algorithm runs two simultaneous searches,one from initial state called as forward-
search and other from goal node called as backward-search, to find the goal node.
- The search stops when these two graphs intersect each other.
- Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.
- It is useful for those problem which have a single start state and single goal state.
o Characteristics of Bi-directional search:
1. Complete – Bidirectional Search is complete if we use BFS in both searches.
2. optimal
3. Time Complexity – O(b^d/2) wh., d=depth, b=children
4. Space Complexity – O(b^d/2)
o Advantages: Bidirectional search is fast and it requires less memory
o Disadvantages: Implementation of the bidirectional search tree is difficult. One should know the goal state
in advance.
Heuristic Function
- Heuristic is a function which is used in Informed Search to find most promising path.
- The heuristic function might not always give the best solution but it guarantees to find a good solution in
reasonable time.
- Heuristic function estimates how close a state is to the goal and it calculates the cost of an optimal path
between the pair of states.
- It is represented by h(n), and its value is always positive.
- Admissibility of the heuristic function is : h(n) <= h*(n)
- Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less than or equal to
the estimated cost.
Informed Search techniques

a. Beam search
- It explores a graph by expanding the most promising node in a limited set.
- Beam search is an optimization of best-first search that reduces its memory reqmnts.
- Beam search uses breadth-first search to build its search tree.
- At each level of the tree, it generates all successors of the states and sorts them in increasing order of
heuristic cost. However, it only stores a predetermined number of best states at each level (called the
beam width - ß).
o Characteristics of Beam Search :--
- Not complete in some cases.
- Not optimal in some cases.
- Time complexity - O(b*m) wh., b= beam width, m=maximum depth of tree
- Space complexity - O(b*m)
b. Hill Climbing
- Hill climbing algorithm is a local search algorithm which continuously moves in the direction of increasing
elevation to find the peak of the mountain or best solution to the problem.
- It is also called greedy local search as it only examines its immediate neighboring node that improves
the current state and does not look beyond.
- A node of hill climbing algorithm has two components which are state and value.
State space Diagram for Hill Climbing

◼ Simple hill Climbing:
- It is the simplest way to implement a hill climbing algorithm.
- It only examines one neighboring node at a time and if it optimizes current cost and it is set as a current state.
o Advantage : Less time consuming
o Disadvantage : No optimal Solution
ALGORITHM
o Step 1: Evaluate the initial state, if it is goal state then return success and Stop.
o Step 2: Loop Until a solution is found or there is no new operator left to apply.
o Step 3: Select and apply an operator to the current state.
o Step 4: Check new state:
1. If it is goal state, then return success and quit.
2. Else if it is better than the current state then assign new state as a current state.
3. Else if not better than the current state, then return to step2.
◼ Steepest-Ascent hill-climbing:
- The steepest-Ascent algorithm is a variation of simple hill climbing algorithm.
- This algorithm examines all the neighboring nodes of the current state and selects one neighbor node which is closest to
the goal state.
o Advantage : Provides optimal Solution.
o Disadvantage : More time consuming as it examines multiple neighbors
ALGORITHM
o Step 1: Evaluate the initial state, if it is goal state then return success and stop, else make current state as initial state.
o Step 2: Loop until a solution is found or the current state does not change.
1. Let SUCC be a state such that any successor of the current state will be better than it.
2. For each operator that applies to the current state:
I. Apply the new operator and generate a new state.
II. Evaluate the new state.
III. If it is goal state, then return it and quit, else compare it to the SUCC.
IV. If it is better than SUCC, then set new state as SUCC.
V. If the SUCC is better than the current state, then set current state to SUCC.
◼ Stochastic hill Climbing:
- Stochastic hill climbing does not examine all its neighboring nodes instead this search algorithm selects one neighbor node
at random and decides whether to choose it as a current state or examine another neighbor node.
Problems in Hill climbing algorithm

- Local Maxima: a local maximum is a state that is better than all its neighbors but is not better than some other states
further away.
To overcome local maximum problem: Utilize backtracking technique .Maintain a list of visited states and explore a new path.
- Plateau: a plateau is a flat area of the search space in which, a whole set of neighboring states have the same values.
To overcome plateaus:. Randomly select a state far away from current state and Make a big jump.
- Ridge: is a special kind of local maximum. It is an area of the search space that is higher than surrounding areas and that
itself has slop and cannot be reached in a single move.
To overcome Ridge :Apply two or more rules before doing the test. It implies moving in several directions at once.
◼ Simulated Annealing
- Annealing is a thermal process for obtaining low energy states of a solid in a heat bath.
- The process contains 2 steps :
1. Increase the temperature of the heat bath to a maximum value at which the solid melts.
2. Decrease carefully the temperature of the heat bath until the particles arrangethemselves in the ground state of
the solid. The ground state of the solid is obtained only if the maximum temperature is high enough and the
cooling is done slowly.
- The rate at which the system is cooled is called the annealing schedule.
1. If it occurs too rapidly, a local minimum is obtained.
2. If there is a slower schedule, then global minima is reached.
- At temperature, t, the probability of an increase in energy of magnitude :𝑃(𝛿𝐸) = 𝑒𝑥𝑝(−𝛿𝐸 /𝑘𝑡)

Where k is a constant known as Boltzmann’s constant so revised probability :𝑃’(𝛿𝐸) = 𝑒𝑥𝑝(−𝛿𝐸 /𝑇)
o The probability of accepting a worse state is a function of both the temperature of the system and the change in the cost
function. As the temperature decreases, the probability of accepting worse moves decreases. If t=0, no worse moves are
accepted (i.e. hill climbing).
- The same process is used in simulated annealing in which the algorithm picks a random move, instead of picking the best
move. If the random move improves the state, then it follows the same path. Otherwise, the algorithm follows the path
which has a probability of less than 1 or it moves downhill and chooses another path.
- In this algorithm we have valley descending rather than hill climbing.
- Simulated annealing avoids climbing false foothills and it avoids the danger of being caught on plateau or ridge.
ALGORITHM
o Step 1: Evaluate the initial state, if it is goal state then return success and stop, else make current state as initial state.
o Step 2: Initial BEST-SO-FAR to current state and Initialize T according to annealing schedule
o Step 3: Loop until a solution is found or no new operator left to apply.
1. For each operator that applies to the current state:
I. Apply the new operator and generate a new state.
II. Evaluate the new state. 𝛿𝐸 = (value of current) – (value of new state)
III. If it is goal state, then return it and quit
IV. If it is better than current state, then assign it as current state and set BEST-SO-FAR to the new state.
V. If it is not better than the current state, then assign it as the current state with probability 𝑃’.
VI. Revise T as necessary according to annealing schedule
c. Best first Search

- Best-first search explores a graph by expanding the most promising node.
- It allows us to switch between paths thus gaining the benefits of both BFS and DFS.
1. Search will start at the root node.
2. The node to be expanded next is selected on the basis of an evaluation function f(n).
3. The node with lowest value for f(n) is selected first.
01. Uniform cost search
- Uniform-Cost Search is similar to Dijikstra’s algorithm .
2. The node to be expanded next is selected on the basis of an evaluation function f(n) = g(n)
Where g(n) is the cost to reach node ‘n’
o Characteristics of Uniform cost search :
- Complete – If there is a solution it will find it
- Optimal - Only selects path with the lowest cost
- Time & space complexity – O(b^d) b=children and d=depth
02. Greedy Best first search (pure heuristic search)
- The greedy best first algorithm is implemented by the priority queue.
- Greedy best-first search algorithm always selects the path which appears best at that moment.
- It is the combination of depth-first search and breadth-first search algorithms.
2. The node to be expanded next is selected on the basis of an evaluation function f(n) = h(n)
where h(n) is a heuristic function that estimates cost to reach the goal node from node ‘n’
o Characteristics of Greedy Best first search :
- Not Complete – No it gets Stuck in loops
- Not Optimal - It can initially select good path but then bad path.
- Time & space complexity – O(b^m) b=children and m=max. Length of tree
o Advantages
- It can switch between BFS and DFS by gaining the advantages of both the algorithms.
- This algorithm is more efficient than BFS and DFS algorithms.
o Disadvantages
- It can behave as an unguided depth-first search in the worst case scenario.
- It can get stuck in a loop as DFS.
03. A* (Non-greedy best first search)

- A* algorithm is most popular best first search
- It has combined features of UCS and greedy best-first search, by which it solves theproblem efficiently.
2. The node to be expanded next is selected on the basis of an evaluation function f(n) = g(n)+h(n)
h(n) is a heuristic function that estimates cost to reach the goal node from node ‘n’
g(n) is the cost to reach node ‘n’
- The implementation of A* Algorithm involves maintaining two lists- OPEN and CLOSED.
o OPEN contains those nodes that have been evaluated by the heuristic function but have not been expanded into
successors yet.
o CLOSED contains those nodes that have already been visited.
Step1: Place the starting node in the OPEN list.

Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function(g+h)
if node n is goal node then return success and stop,
Otherwise expand node n and generate all of its successors, and put n into the closed list.
Step 4: For each successor n', check whether n' is already in the OPEN or CLOSED list
if not then compute evaluation function for n' and place into Open list.
Else it should be attached to the back pointer which reflects the lowest g(n') value.
Step 5: Return to Step 2.
o Characteristics of A* :
- Complete: If Branching factor is finite and Cost at every action is fixed.
- Optimal: If A* graph-search is consistent and If h(n) is Admissible heuristic (means h(n) never overestimates the
cost to reach the goal .
- Time Complexity and Space Complexity: O(b^d), where b is the branching factor.
o Advantages of A*
- It can solve very complex problems.It is optimal and complete.
o Disadvantages of A*
- It does not always produce the shortest path as it mostly based on heuristics and approximation.
- It has some complexity issues.
- The main drawback is memory requirement as it keeps all generated nodes in the memory
o Admissibility of A* (Effectivity of A*)
1. Finite branching factor. And if a graph is finite.

2. Distance between two nodes is > 0.
3. Heuristic has to be underestimated.
- Even in the worst case, A* is still better than uninformed search algorithms.
Overestimated heuristic Underestimated heuristic
In Overestimated heuristic, it is not necessary In Underestimated heuristic, A* is optimal.

that A* is optimal.
Greedy vs. A*
- Greedy best-first search expands nodes with minimal f(n)=h(n). It is not optimal, but is efficient.
- A* search expands nodes with minimal f(n)=g(n)+h(n). A* is complete and optimal.
Problem Reduction (Problem Reduction Using AO*)

- AO* algorithm uses the concept of AND-OR graphs to decompose any complex problem give into a
smaller set of problems which are further solved.
- AND-OR graphs are specialized graphs t where AND side of the graph represent a set of task that need
to be done to achieve the main goal , whereas the OR side of the graph represent the different ways of
performing task to achieve the same main goal.
- One AND arc may point to any number of successor nodes.
- AO* will always find a minimum cost solution.
The AO* Algorithm features
- Rather than the two lists, OPEN and CLOSED, that were used in the A* algorithm, the AO* algorithm will use a
single structure GRAPH.
- Each node in the graph will point both down to its immediate successors and up to its immediate predecessors.
- Each node in the graph will also have associated with it an h' value, an estimate of the cost of a path from itself
to a set of solution nodes.
- We will not store g (the cost of getting from the start node to the current node) as we did in the A* algorithm
because such a value is not necessary because in the top-down traversing of the edge which guarantees that
only nodes that are on the best path will ever be considered for expansion.
A* vs AO*
A* algorithm is a OR graph algorithm AO* is an AND-OR graph Algorithm
requires more memory as compared to AO* requires lesser memory
can go into an infinite loop. doesn't go into infinite loop
Stops when it finds optimal solution Stops when it finds any solution
Less efficient More Efficient
Constraint satisfaction Problem (CSP)

- CSP are mathematical problems where one must find objects that satisfy a number of constraints.
- It consists of the following :
1. A Finite set of variables {x1,x2,….,xn}
2. A set of discrete values {D1,D2,….,Dn} known as the domain from which the solution is picked.
3. A set of constraints
- CSP: For each i (1<i<n), find a value in Di for xi so that all constraints are satisfied.
- CSPs are commutative. That means the order of any given set of actions has no effect on the outcome
- Constraint Satisfaction is a two-step process:
1. First constraints are discovered and propagated as far as possible throughout the system.
2. Then if there is still not a solution then a guess about something is made and it is added as a new constraint.
- Examples
N queen problem , Sudoku, A cryptarithmetic problem ,A map coloring problem (We are given a map, and we are
told to color it using k colors, sothat no two neighboring countries have the same color)
Means-Ends Analysis
- Most of the search strategies either reason forward of backward, Often a mixture of the two
directions is appropriate for solving complex and large problems.
- Such mixed strategy would make it possible to solve the major parts of problem first and solve the
smaller problems the arise when combining them together.
- Such a technique is called Means - Ends Analysis.
- The means -ends analysis process centers around finding the difference between current state and
goal state.
- The means-ends analysis process can be applied recursively for a problem.
- sIt is a strategy to control search in problem-solving.
• Steps :
1. First, evaluate the difference between Initial State and final State.
2. Select the various operators which can be applied for each difference.
3. Apply the operator at each difference, which reduces the difference between the current state and
goal state.
1. Chess
Problem characteristic SatisfiedReason

Is the problem decomposable? No One game have Single solution
Can solution steps be ignored or undone? No In actual game(not in PC) we can’t undoprevious steps
Is the problem universe No Not universe Predictable as we are not sure about move of
predictable? other player(second player)
Is a good solution absolute or relative? Absolute - Relative Solution :Once you find one solution you
check other solutions to identify which solution is
the best
- Absolute solution :once you get onesolution you
do need to bother aboutother possible solution.
By considering this chess is absolute
Is the solution a state or a path? Path For natural language understanding,some of the words have
different interpretations so sentence may cause ambiguity.
To solve such problems we need interpretation only, the
workings are not necessary (i.e path to solution is not
necessary)
In chess winning state(goal state)describe path to state
What is the role of knowledge? lot of knowledge helps to constrain thesearch for a solution.
Does the task require human- No In chess additional assistance is not required
interaction?
2. Water jug
Problem characteristic Satisfied Reason
Is the problem decomposable? No One Single solution
Can solution steps be ignored orundone? Yes
Is the problem universe Yes Problem Universe is predictable because to solve this
predictable? problem it requires only one person. We can predict what
will happen in next step
Is a good solution absolute orrelative? absolute Absolute solution: water jug problem may have number of
solution , but once we found one solution, no need to bother
about other solution because it doesn’t effect on its cost
Is the solution a state or a path? Path Path to solution
What is the role of knowledge? lot of knowledge helps to constrain thesearch for a solution.
Does the task require human-interaction? Yes Additional assistance is required like to get jugs or pump
3. 8 puzzle
Can solution steps be ignored or Yes We can undo the previous move
undone?
Is the problem universe Yes Problem Universe is predictable because to solve this problem
predictable? it require only one person .we can predict what will be position
of blocks in next move
Is a good solution absolute orrelative? absolute Absolute solution : once you get onesolution you do need
to bother about other possible solution.
By considering this 8 puzzle is absolute
Is the solution a state or a path? Path In 8 puzzle winning state(goal state)describe path to state
What is the role of knowledge? lot of knowledge helps to constrain the search for a solution.
Does the task require human- No In 8 puzzle additional assistance is not required
interaction?
4. Travelling salesman Problem

Is the problem universe predictable Yes
Is a good solution absolute or relative? absolute TSP is absolute
Is the solution a state or a path? Path In TSP (goal state) describe path tostate
What is the role of knowledge? lot of knowledge helps to constrain the search for a
solution.
Does the task require human-interaction? No In TSP additional assistance is notrequired
5. Tower of Hanoi Problem

Can solution steps be ignored or Yes
undone?
Is the problem universe Yes
predictable?
Is a good solution absolute or absolute Absolute solution : once you get one
relative? solution you do need to bother aboutother
possible solution.
By considering this Tower of Hanoi is
absolute
Is the solution a state or a path? Path In tower of Hanoi winning state(goalstate)
describe path to state
What is the role of knowledge? lot of knowledge helps to constrain thesearch for a
solution.
Does the task require human- No In tower of Hanoi additional assistanceis not
interaction? required
6. Missionaries and cannibals

Is the problem universe Yes Problem Universe is not predictable aswe are not sure
predictable? about move of other
player(second player)
Is a good solution absolute or absolute Absolute solution : once you get one solution you do
relative? need to bother about other possible solution.
By considering this is absolute
Is the solution a state or a path? Path So In winning state(goal state) describe path to state
What is the role of knowledge? lot of knowledge helps to constrain the search for a
solution.
Does the task require human- Yes Conversational
interaction? In which there is intermediate communication
between a person and the computer, either to
provide additional assistance to the computer or to
provide additional information to the user, or both. In
chess additional
Assistance is required to move Missionaries to
other side of river of other assistance is required
1. Chess
o Initial State and Goal State
- The starting position can be described by an 8 X 8 array square in which each element square (x, y),describes the
board position of an appropriate piece in the official chess opening position.
- The goal is any board position in which the opponent does not have a legal move and hisor her “king” is under attack.
o Production rules
- They are legal chess moves that can be described as a set of rules consisting of two parts:
A left side that gives the current position and the right side that describes the change to be made to the board
position. Example :
Current Position While pawn at square ( 5 , 2), AND Square ( 5 , 3 ) is empty, AND Square ( 5 , 4) isempty.
Changing Board Position :Move pawn from Square ( 5 , 2 ) to Square ( 5 , 4 ) .
1. A Water Jug Problem

- You are given two jugs, a 4-litre one and a 3-litre one, a pump which has unlimited water which you can use to fill the jug,
and the ground on which water may be poured.
- Neither jug has any measuring markings on it.
- How can you get exactly 2 litres of water in the 4-litre jug?
o Initial State and Goal State

- Here the initial state is (0, 0). The goal state is (2, n) for any value of n.
o Production Rules One of the Possible Solution

1. Fill 4L jug completely
Litres in 4L jug Litres in 3L jug Rule applied
2. Pour the water from 4L jug into 3L jug
4 0 1
3. Empty 3L jug completly.
1 3 2
o Explanation 1 0 3
1. First we will fill the 4 litre jug completely with water. 0 1 2
2. Pour the water from 4L jug into 3L resulting in 1L water 4 1 1
in 4 L jug. 2 3 2
3. Empty 3L jug completly.
4. Pour the water from 4L jug into 3L jug resulting in 1L water in 3L and 4L jug is completely empty
5. Fill the 4L jug with water completely again.
6. Pour the water from 4L jug into 3L resulting in 2L water in 4L jug which was our required quantity.
2. 8 Puzzle problem
- Given a 3×3 board with 8 tiles (every tile has one number from 1 to 8) and one empty space.
- The program is to change the initial configuration into the goal configuration.
- A solution to the problem is an appropriate sequence of moves i.e. we can slide four adjacent (left, right, above, and
below) tiles into the empty space.
3. Travelling Salesman Problem

- Given a set of cities and distances between every pair of cities, the problem is to find the shortest possible route that
visits every city exactly once and returns to the starting point.
4. Tower of Hanoi Problem

- Where we have three stands and n discs. Initially, Discs are placed in the first stand. We have to place discs into the third
stand .2nd stand /auxiliary stand can be used as a helping stand.
Production rules
- We can transfer only one disc for each movement.
- Only the topmost disc can be picked up from a stand.
- No bigger disc will be placed at the top of the smaller disc.
5. Missionaries and Cannibals

- Three missionaries and three cannibals must cross a river using a boat which can carry at most two people
Production rules
- For both banks, if there are missionaries present on the bank, they cannot be outnumbered by cannibals
- The boat cannot cross the river by itself with no people on board.
Introduction
- Knowledge Representation : Represent knowledge about the world in a manner that facilitates drawing
Ch-3
conclusions from knowledge.
- Intelligent agents should have capacity for:

• Perceiving: acquiring information from environment
• Knowledge Representation : representing its understanding of the world
• Reasoning: inferring the implications of what it knows.
• Acting: choosing what it want to do and carry it out.
- Knowledge representation languages should have precise syntax and semantics.

• Syntax: defines the sentences in the language
• Semantics: defines the “meaning” to sentences
Representation and Mappings

- Knowledge and Representation are two distinct entities.
- Knowledge is a description of the world whereas Representation is the way knowledge is encoded.
- Knowledge determines a system’s competence by which it knows and Representation determines a system’s
performance in doing something.
- One way to think of structuring these entities is at two levels :
(a) the knowledge level, at which facts are described
(b) the symbol level, at which representations of objects at the knowledge level are defined in terms of
symbols that can be manipulated by programs.
- Different types of knowledge require different kinds of representation.
MAPPING BETWEEN FACTS AND REPRESENTATION
Properties of Knowledge Representation System
- The Knowledge Representation mechanisms are often based on: Logic, Rules, Frames, Semantic Net etc.
- A good knowledge representation system must possess the following properties.
o Representational Adequacy: It is the ability to represent the required knowledge.
o Inferential Adequacy: It is the ability to manipulate the knowledge represented to produce new knowledge
corresponding to that inferred from the original.
o Inferential Efficiency: It is the ability to direct the inferential knowledge mechanism into the most productive
directions by storing appropriate guides.
o Acquisitional Efficiency: It is the ability to acquire new knowledge using automatic methods wherever
possible rather than reliance on human intervention.
Approaches to Knowledge representation
There are mainly four approaches to knowledge representation
1. Simple Relational knowledge :

- The simplest way to represent declarative facts is a set of relations of the same sort used in the database
system.
- Provides a framework to compare two objects based on equivalent attributes.
2. Inheritable Knowledge
- In this approach, all data must be stored into a hierarchy of classes and all the classes should be arranged
in a generalized form .
- This approach contains inheritable knowledge which shows a relation between instance and class.
- In this approach
o We apply inheritance property
o Knowledge Elements inherit values from their parents.
o objects and values are represented in Boxed nodes
o Arrows are used to point from objects to their values.
- This structure is known as a slot and filler structure, semantic network or a collection of frames where Every
individual frame can represent the collection of attributes and its value.
Algorithm: Property Inheritance

(To retrieve a value V for an attribute A of an instance object O)
1. Find the object O in the knowledge base
2. If there is a value for the attribute A, report it
3. Otherwise look for a value of an attribute instance, if none
then fail
4. Otherwise move to the node corresponding to that value and
find a value for an attribute and then report it
5. Otherwise, search through until there is no value for the isa
attribute or until an answer is found
3. Inferential Knowledge
- It represents knowledge in the form of formal logics.
- It can be used to derive more facts and to verify truths of new statements.
- It guarantees correctness.
4. Procedural knowledge
- In this approach knowledge is encoded in small programs and codes which describes how to do specific things,
and how to proceed.
- It is a representation in which the control information, to use the knowledge, is embedded in the knowledge
itself.
- For example, computer programs, directions, and recipes.
Types of Knowledge
Tacit or Implicit or Informal Explicit or formal
Exists within a human being Exists outside a human being
It is embodied. It is embedded.
Difficult to articulate formally. Can be articulated formally.
Difficult to communicate or share. Can be shared, copied, processed and stored.
Hard to steal or copy. Easy to steal or copy
Drawn from experience, action, subjective insight Drawn from artifact of some type as principle,
procedure, process, concepts.
Playing a musical instrument,Humour, Emotional Encyclopedia and books are classic examples of such
intelligence, Speaking a certain language are examples knowledge
of such knowledge
Representation of Simple facts in logic
Logic
- Logic is the primary vehicle for representing and reasoning about knowledge.
- A logic is a formal language, with precisely defined syntax and semantics
- It provides a way of deriving new knowledge from old using mathematical deduction.
- Using logic we can conclude that a new statement is true by proving that it follows from the statements that are
already known.
- Specifically, we will be dealing with formal logic. The advantage of using formal logic is that it is precise and
definite.
1. Propositional Logic
- A proposition a simple declarative sentence eg : “the book is expensive”
- A propositions can be either true or false but not both.
- We can use any symbol for a representing a proposition, such A, B, C, P, Q, R.
- The propositions are combined by connectives.(Logical Operators)
There are two types of Propositions:
1. Atomic Propositions : It consists of a single proposition symbol. These are the sentences which must be either true
or false.
a) 2+2 is 4, it is an atomic proposition as it is a true fact.
b) "The Sun is cold" is also an atomic proposition as it is a false fact.
2. Compound propositions : Compound propositions are constructed by combining atomic propositions.

a)Ankit is a doctor, and his clinic is in Mumbai. : P= Ankit is a doctor Q= Ankit’s clinic is in Mumbai : P^Q
b) I am alive if and only if I am breathing : P= I am breathing, Q= I am alive : P ⇔ Q.
- In Propositional logic in order to draw conclusions, facts are represented in a more convenient way as,
o Marcus is a man : man(Marcus)
o All men are mortal : mortal(men)
- But propositional logic fails to capture the relationship between an individual being a man and that individual
being a mortal.
2. Predicate Logic
- First-order Predicate logic (FOPL) is a formal language in which propositions are expressed in terms of
predicates, variables and quantifiers.
- It should be viewed as an extension to propositional logic.
- A predicate is an expression of one or more variables defined on some specific domain.
Universal quantification Existential quantification
(∀x)P(x) means that P holds for all values of x in the (∃ x)P(x) means that P holds for some value of x in the
domain associated with that variable domain associated with that variable
(∀x) dolphin(x) → mammal(x) (∃ x) mammal(x) ∧ lays-eggs(x)
A well-formed formula (wff) is a sentence containing no “free”variables i.e. all variables are “bound” by universal or
existential quantifiers.
Well Formed Formula (wff) is a predicate holding any of the following
1. All propositional constants and propositional variables, Truth and false values, atomic propositions and all
connectives connecting wffs are wffs
2. If x is a variable and Y is a wff, ∀Y and ∃x are also wff
Facts Represented as Well Formed Formula in FOPL

- Marcus was a man man(Marcus)
- Marcus was a Pompeian. Pompeian(Marcus)
- All Pompeians were Romans.  x: Pompeian(x) → Roman(x)
- Caesar was a ruler. ruler(Caesar)
- All Pompeians were either loyal to Caesar or hated him. x: Roman(x) → loyalto(x, Caesar)  hate(x, Caesar)
- Every one is loyal to someone. x: y: loyalto(x, y)
- People only try to assassinate rulers they are not loyal to. x: y: person(x)  ruler(y)  tryassassinate(x, y) →
loyalto(x, y)
- Marcus tried to assassinate Caesar. tryassassinate(Marcus, Caesar)
Representing Instance and ISA Relationships
- Attributes instance and isa play important role in a useful form of reasoning called property inheritance.
- The predicate instance is a binary one, whose first argument is an object and whose second argument is a class
to which the object belongs.
- Predicate instance and isa explicitly captures the relationships between instance and class.
- These relationships are to express class membership and class inclusion.
- isa : Shows class inclusion and instance : shows class membership
Resolution
- Resolution produces proofs by refutation.
- To prove a statement resolution attempts to show that the negation of the statement is un-satisfiable
- The resolution procedure is a simple iterative process: at each step, two clauses, called the parent clauses, are
compared (resolved), resulting into a new clause that has been inferred from them.
- The new clause represents ways that the two parent clauses interact with each other
Steps
1. Conversion of facts into first-order logic.
2. Convert FOL into CNF (clause form)
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).
1. All people who are graduating are happy.

2. All happy people smile.
3. Someone is graduating.
Prove : Someone is smiling
Step 1: Facts into FOL

- ∀x : graduating(x) ⇒ happy(x)
- ∀x: happy(x) ⇒ smile(x)
- ∃x :graduating(x)
Step 2:FOL into CNF
o Eliminate implication
- ∀x ¬graduating(x) ∨ happy(x)
- ∀x ¬happy(x) ∨ smile (x)
- ∃x graduating(x)
- ¬∃x smiling(x)
o Move ¬ inwards
- ∀x ¬happy(x) ∨ smile (x)
- ∃x graduating(x)
- ∀x smiling(x)
o Rename or Standardize variables

- ∀y ¬happy(y) ∨ smile (y)
- ∃z graduating(z)
- ∀w smiling(w)
o Skolemization: Eliminate existential instantiation quantifier

- ¬graduating(x) ∨ happy(x)
- ¬happy(y) ∨ smile(y)
- graduating(A) , Where A=constant
- ∀w smiling(w)
o Drop Universal Quantifiers

- ¬graduating(x) ∨ happy(x)
- ¬happy(y) ∨ smile(y)
- graduating(A)
- smile(w)
o Distribute conjunction ^ over disjunction ¬ (Here Not needed)
Step 3: Negate the statement that needs to be proved : ¬smile(w)
Step 4: Draw the resolution graph

Procedural vs Declarative Knowledge
Example: A car has four tires. - Declarative

Example: Knowing how to change the tires on a car. - Procedural
Rule Based Expert System

Two inference methods are used in rule-based systems
• Forward Chaining (Forward chaining, data driven reasoning) : start with known data and progress to a conclusion
Take the known facts and try to match against the antecedents of rules from the rule base.
- If a rule is fired then the consequents are added to the database of rules new facts based on existing facts
and the knowledge of the problem domain are deduced.
- Inference engines that use the forward chaining strategy apply the strategy exhaustively, until no new
deductions are made.
Eg : A is the starting point. A->B represents a fact. This fact is used to achieve a decision B.
- Tom is running (A)
- If a person is running, he will sweat (A->B)
- Tom is sweating(B)
• Backward Chaining (Backward chaining, goal driven reasoning): start with a possible conclusion and try to prove
its validity by searching for evidence.(What we are trying to prove is our goal)
Take the goal and try to match against the consequents of rules from the rule base.
- If a rule is fired, then the antecedents are added to the set of goals, and the consequents are removed.
- Inference engines that use the backward chaining strategy apply the strategy exhaustively, until no more
rules are fired.
Eg: B is the goal or endpoint, that is used as the starting point for backward tracking. A is the initial state. A->B is a
fact that must be asserted to arrive at the endpoint B.
- Tom is sweating (B).
- If a person is running, he will sweat (A->B).
- Tom is running (A).
Forward vs Backward Chaining
Forward Chaining Backward chaining
1. It starts from known facts and applies inference rule to It starts from the goal and works backward through
extract more data unit it reaches to the goal. inference rules to find the required facts that support
the goal.
2. It is a bottom-up approach It is a top-down approach
3. It is known as data-driven inference technique. It is known as goal-driven technique
4. It applies a breadth-first search strategy. It applies a depth-first search strategy.
5. It tests for all the available rules It only tests for few required rules.
6. It is suitable for the planning, monitoring, control, and Itis suitable for diagnostic, prescription, and debugging
interpretation application. application.
7. It can generate an infinite number of possible It generates a finite number of possible conclusions.
conclusions.
8. It operates in the forward direction. It operates in the backward direction.
9. It is aims for any conclusion. It only aims for the required data.
Different Methods of Reasoning
Reasoning is the act of deriving a conclusion from certain properties using a given methodology.
1. Deductive Reasoning
Ch-4
- Deductive reasoning is deducing new information from logically related known information.
Example:
Premise-1: All the human eats veggies
Premise-2: Suresh is human.
Conclusion: Suresh eats veggies.
The general process of deductive reasoning is given below:
2. Inductive Reasoning
- Inductive reasoning is means to arrive at a conclusion using limited sets of facts by the process of generalization.
Example:
Premise: All of the pigeons we have seen in the zoo are white.
Conclusion: Therefore, we can expect all the pigeons to be white.
The general process of inductive reasoning is given below:
3. Abductive Reasoning
- Abductive reasoning starts with single or multiple observations then seeks to find the most likely explanation or
conclusion for the observation.
- Abductive reasoning is an extension of deductive reasoning, but in abductive reasoning, the premises do not
guarantee the conclusion.
Example:
Implication: Cricket ground is wet if it is raining
Axiom: Cricket ground is wet.
Conclusion It is raining.
4. Common Sense Reasoning

- Common sense reasoning is an informal form of reasoning, which can be gained through experiences.
Example: The given statements a human mind can easily understand and assume.
One person can be at one place at a time. If I put my hand in a fire, then it will burn.
5. Monotonic Reasoning
- In monotonic reasoning, once the conclusion is taken, then it will remain the same even if we add some other
information to existing information in our knowledge base.
- Monotonic Reasoning is the process that does not change its direction.
- Monotonic Reasoning will move in increasing order or decreasing.
- But since it depends on knowledge and facts, It will only increase and will never decrease.
- Any theorem proving is an example of monotonic reasoning.
Example:Earth revolves around the Sun.
It is a true fact, and it cannot be changed even if we add another sentence in knowledge base like, "The moon revolves
around the earth" Or "Earth is not round," etc.
Advantages :
o Can be used for theorem proving.
o In monotonic reasoning each old proof will always remain valid.
Disadvantages :
o In we can only derive conclusions from the old proofs, so new knowledge from the real world cannot be added
o It cannot be used for hypothesis knowledge.
6. Non Monotonic reasoning (IMP)
- Non-Monotonic means something which can vary according to the situation or condition.
- Non-monotonic Reasoning is the process that changes its direction or values as the knowledge base increases.
- Non-monotonic Reasoning will increase or decrease based on the condition.
Eg : Consider a bowl of water, If we put it on the stove and turn the flame on it will obviously boil hot and
as we will turn off the flame it will cool down gradually
- Logic will be said as non-monotonic if some conclusions can be invalidated by adding more knowledge into our
knowledge base.
- "Human perceptions for various things in daily life, "is a general example of non-monotonic reasoning because
Human reasoning is not monotonic.
Example: Let suppose the knowledge base contains the following knowledge:
Birds can fly
Penguins cannot fly
Pitty is a bird
- So from the above sentences, we can conclude that Pitty can fly.
- However, if we add one another sentence into knowledge base "Pitty is a penguin", which concludes "Pitty cannot
fly", so it invalidates the above conclusion.
Advantages :
o It can be used for real-world systems such as Robot navigation.
o In Non-monotonic reasoning, we can choose probabilistic facts or can make assumptions.
Disadvantages :
o In non-monotonic reasoning, the old facts may be invalidated by adding new sentences.
o It cannot be used for theorem proving.
- Non-monotonic reasoning is based on supplementing absolute truth using tentative beliefs.

- These tentative beliefs are generally based on default assumptions that are made in the lack of evidence.
- A non-monotonic reasoning system tracks a set of tentative beliefs and revises those beliefs when knowledge is
observed or derived.
Uncertainties present in reasoning

Uncertainties Desired Action
Incompleteness of Knowledge Compensate for lack of knowledge
Inconsistencies of Knowledge Resolve ambiguities and contradictions
Changing Knowledge Update the knowledge base over time
Probabilistic reasoning
-
Ch-5
Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to
indicate the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with logic to
handle the uncertainty.
- In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge
o Bayes' rule/Bayes’ Theorem
o Bayesian Belief Network
- Probability is the numerical measure of the likelihood that an uncertain event will occur.
o 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
o P(A) = 0, indicates total uncertainty in an event A.
o P(A) =1, indicates total certainty in an event A.
- Prior probability: The prior probability of an event is probability computed before observing new information.
- Posterior Probability: The probability that is calculated after information has taken into account. It is a
combination of prior probability and new information.
- Marginal Probability: The probability of an event irrespective of the outcomes of other random variables. P(A).
- Joint Probability: Probability of two (or more) simultaneous events, It is symmetrical. P(A, B) = P(A | B) * P(B)
- Conditional Probability: Probability of one (or more) event given the occurrence of another event .It is not
Symmetrical. P(A | B) = P(A, B) / P(B)
1. Bayes Theorem
- In statistics and probability theory, the Bayes’ theorem is a mathematical formula used to determine the
conditional probability of events.
- It describes the probability of an event based on prior knowledge of the conditions that might be relevant to the
event.
- It relates the conditional probability and marginal probabilities of two random events.
- The Bayes’ theorem is expressed as: P(B|A)P(A)
P(A|B) =
P(B)
- P(A|B) – the probability of event A occurring, given event B has occurred (Posterior Probablity)
- P(B|A) – the probability of event B occurring, given event A has occurred
- P(A) – the probability of event A (Prior Probablity)
- P(B) – the probability of event B (Marginal Probablity)
- Note that events A and B are independent events.
2. Bayesian Belief Network

- "A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
- It is also called a Bayes network, belief network, decision network, or Bayesian mode
- It can be used in various tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
- Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
- In the DAG each node corresponds to the random variables, and a variable can be continuous or discrete.
- Arc or directed arrows represent the conditional probabilities between random variables.
- Example: Harry installed a new burglar alarm at his home to detect burglary. But the alarm responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a responsibility to inform Harry at work
when they hear the alarm. David always calls Harry when he hears the alarm, but sometimes he gets confused
with the phone ringing and calls at that time too. On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm.
- Problem: Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
- The Conditional probability of Alarm A depends on Burglar and earthquake

- The Conditional probability of David calling D depends on the probability of Alarm and same is true for the
conditional probability of Sophia calling S.
From the formula of joint distribution, we can write the problem statement in the form of probability distribution
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E)
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution
Certainty Factor in a Rule based System

-A Certainty Factor (CF) is a numerical estimates of the belief or disbelief on a conclusion in the presence of set
of evidence
1. Use a scale from 0 to 1, where 0 indicates certainly false ( total disbelief), 1 indicates definitely true (total
belief). Other values between 0 to 1 represents varying degrees of beliefs and disbeliefs.
2. Use a scale from –1 to +1 where -1 indicates certainly false, +1 indicates definitely true, and intermediate
values represent varying degrees of certainty, with 0 meaning unknown.
o There are many schemes for treating uncertainty in rule based systems. The most common are
- Adding certainty factors.
- Adoptions of Dempster-Shafer belief functions.
- Inclusion of fuzzy logic.
- In a rule based system, a rule is an expression of the form "if A then B" where A is an assertion and B can be
either an action or another assertion.
- A problem with rule-based systems is that often the connections reflected by the rules are not absolutely
certain and So In such cases, a certainty measure is added to the premises as well as the conclusions
- Resultant Rule: how much a change in the certainty of the premise will change the certainty of the conclusion.
- If A (with certainty x) then B (with certainty f(x))
- Each rule has a certainty attached to it.
- Example: In MYCIN once the identities of the virus/bacteria are found attempts to select a therapy by which
the disease can be treated.
- A certainty factor (CF [h, e]) is defined in terms of two components:
1. MB[h, e] - a measure (between 0 and 1) of belief in hypothesis “h” given the evidence “e”.
 MB measures the extent to which the evidence supports the hypothesis.
 It is zero if the evidence fails to support the hypothesis.
2. MD[h, e] - a measure (between 0 and 1) of disbelief in hypothesis “h” given the evidence “e”.
 MD measures the extent to which the evidence supports the negation of the hypothesis. It is zero if
the evidence support the hypothesis.
CF[h, e] = MB[h, e] - MD[h, e]
Dempster – Shafer Theory

- Bayesian theory is only concerned about single evidences and Bayesian probability cannot describe ignorance.
- DST is an evidence theory, it combines all possible outcomes of the problem.
- Hence it is used to solve problems where there may be a chance that a different evidence will lead to some
different result.
Example :
Let us consider a room where four people are present, A, B, C and D. Suddenly the lights go out and when the
lights come back, B has been stabbed in the back by a knife, leading to his death. No one came into the room and
no one left the room. We know that B has not committed suicide. Now we have to find out who the murderer is.
To solve these there are the following possibilities:
• Either {A} or {C} or {D} has killed him.
• Either {A, C} or {C, D} or {A, C} have killed him.
• three of them {A, C, D} have killed him
• None of them have killed him {o}
- There will be the possible evidence by which we can find the murderer by measure of plausibility.
- In Dempster-Shafer Theory we consider sets of propositions and assign to each of them an interval in which the
degree of belief must lie. [Belief, Plausibility]
- Belief (denoted as Bel) measures the strength of the evidence in favor of a set of propositions. It ranges from 0 (
no evidence) to 1 (definite certainty)
- Plausibility (PI) is,PI(s) = 1- Bel(¬s) It also ranges from 0 to 1 and measures the extent to which evidence in favor
of ¬s leaves room for belief in s.
- Let’s take an example where we have some mutually exclusive hypothesis.{Allergy, Flu, Cold, Pneumonia},
- The set is denoted by θ and we want to attach some measure of belief to elements of θ.
- The key function we use here is a Probability Density Function, denoted by m which is defined for all elements
of θ but and all subsets of it. We must assign m so that the sum of all the m values assigned to subsets of θ is 1.
- The quantity m(p) measures the amount of belief that is currently assigned to exactly to the set “p” of
hypothesis.
Let X be the set of subsets of θ for m1 and let Y be the corresponding set for m2.
We define m3, as a combination of the m1 and m2 to be, 𝐦3 (𝐙) = σ𝐗∩𝐘=𝐙 𝐦1 (𝐗) ∙ 𝐦2 (𝐘)
1 − σ𝐗∩𝐘=∅ 𝐦1 (𝐗) ∙ 𝐦2 (𝐘)
• suppose m1 corresponds to our belief after observing fever: m1= { F, C, P} = 0.6 ,θ = (0.4)
• suppose m2 corresponds to our belief after observing runny nose: m2= { A ,F, C} =0.8 ,θ = 0.2
• Then we can compute their combination m3 using the following table:
Fuzzy Logic
- The term fuzzy refers to things that are not clear or are vague
- So fuzzy logic provides very valuable flexibility for reasoning.
- In the boolean system truth value, 1.0 represents the absolute truth value and 0.0 represents the absolute
false value. But in the fuzzy system, there is no logic for the absolute truth and absolute false value. But there
is an intermediate value to present what is partially true and partially false.
- Crisp set theory is governed by a logic that uses one of only two values: true or false.This logic cannot represent
vague concepts. In such case fuzzy set theory comes to rescue where an element is with a certain degree of
membership
- In the architecture of the Fuzzy Logic system consists of four different components.
1. Rule Base :
- Rule Base is a component used for storing the set of rules and the If-Then conditions given by the experts
- There are so many functions which offer effective methods for designing and tuning of fuzzy controllers.
- These updates or developments decreases the number of fuzzy set of rules.
2. Fuzzification :
- Fuzzification is a component for transforming the system inputs, i.e., crisp number into fuzzy steps.
- The crisp numbers are those inputs which are measured by the sensors and then fuzzification passes them into
the control systems for further processing.
- This component divides the input signals into following five states in any Fuzzy Logic system:Large Positive
(LP),Medium Positive (MP),Small (S),Medium Negative (MN),Large negative (LN)
3. Inference Engine :
- This component is a main component in any Fuzzy Logic system, where all the information is processed
- It allows users to find the matching degree between the current fuzzy input and the rules.
- After the matching degree, this system determines which rule is to be added according to the given input field.
- When all rules are fired, then they are combined for developing the control actions.
4. Defuzzification
- Defuzzification is a component, which takes the fuzzy set inputs generated by the Inference Engine, and then
transforms them into a crisp value. It is the last step in the process of a fuzzy logic system.
- The crisp value is a type of value which is acceptable by the user.
- Various techniques are present to do this, but the user has to select the best one for reducing the errors.
Operations On the Fuzzy set
Game Theory
-
-
It does not prescribe a way or say how to play a game.
It is the set of ideas and techniques for analysing conflict situations between two or more players.
Ch-6
MiniMax Search
- Mini-max algorithm is a backtracking algorithm which is used in decision-making and game theory.
- Min-Max algorithm is mostly used for two player games Such as Chess, Checkers, tic-tac-toe, go etc.
- Mini-Max algorithm uses recursion to search through the game-tree.
- The minimax algorithm performs a depth-first search for the exploration of the complete game tree.The minimax algorithm
proceeds all the way down to the terminal node of the tree, then backtrack the tree as the recursion.
- In this algorithm two players play the game, one is called MAX and other is called MIN and both players of the game are
opponent of each other.
- MAX will select the maximized value and MIN will select the minimized value.
- The steps for the min max algorithm in AI can be stated as follows:
1. Create the entire game tree.
2. Evaluate the scores for the leaf nodes based on the evaluation function.
3. Backtrack from the leaf to the root nodes:
For Maximizer, choose the node with the maximum score.
For Minimizer, choose the node with the minimum score.
4. At the root node, choose the node with the maximum value and select the respective move.
• Characteristics of mini-Max
- Complete: For finite search tree.
- Optimal- If both opponents are playing optimally.
- Time complexity is O(bm) and Space Complexity is O(bm) where b is branching factor, and m is the maximum depth of the
tree. (Same as DFS as it performs DFS Search of the tree)
• Limitations of MiniMax
- The main drawback is that it gets really slow for complex games such as Chess, go, etc. This type of games has a huge
branching factor, and the player has lots of choices to decide.
- This limitation of the minimax algorithm can be improved from alpha-beta pruning.
Alpha-Beta Pruning
- It is an optimization technique for the minimax algorithm
- There is a technique by which without checking each node of the game tree we can compute the correct minimax decision,
and this technique is called pruning. This involves two threshold parameter Alpha and beta for future expansion, so it is
called alpha-beta pruning.
- Alpha-beta pruning can be applied at any depth of a tree, and sometimes it does not only prune the tree leaves but also
entire sub-tree.
- The two-parameter can be defined as:
o Alpha: The best (highest-value) choice we have found so far at any point along the path of Maximizer. The initial value
of alpha is -∞.
o Beta: The best (lowest-value) choice we have found so far at any point along the path of Minimizer. The initial value of
beta is +∞.
- The main condition which required for alpha-beta pruning is: α>=β
o Key points about Alpha Beta Pruning

- The Max player will only update the value of alpha.
- The Min player will only update the value of beta.
- While backtracking the tree, the node values will be passed to upper nodes instead of values of alpha and beta.
- We will only pass the alpha, beta values to the child nodes
o The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is examined.
- Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the leaves of the tree, and works exactly
as minimax algorithm. The time complexity for such an order is O(bm).
- Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning happens in the tree, and best moves
occur at the left side of the tree. The time complexity for such an order is O(bm/2)
What is Planning in AI ?
-
Ch-7
It is about the decision making tasks performed by the robots or computer programs to achieve a specific goal.
- Planning refers to the process of computing several steps of a problem-solving procedure before executing any of them.
Components of Planning System

- The planning consists of following important steps:
1. Choose the best rule for applying the next rule based on the best available heuristics.
- The most widely used technique for selecting appropriate rules to apply is first to isolate a set of differences between
desired goal state and then to identify those rules that are relevant to reduce those differences.
2. Apply the chosen rule for computing the new problem state.
- In simple systems, applying rules is easy because each rule simply specifies the problem state but in complex systems,
we have to deal with rules that specify only a small part of the complete problem state.
3. Detect when a solution has been found.
- A planning system has succeeded in finding a solution to a problem when it has found a sequence of operators that
transform the initial problem state into the goal state.
4. Detect dead ends so that they can be abandoned and the system’s effort is directed in more fruitful directions.
- As a planning system is searching for a sequence of operators to solve a particular problem, it must be able to detect a
path that can never lead to a solution. So the same reasoning mechanisms that can be used to detect a solution can
often be used for detecting a dead end.
5. Detect when an almost correct solution has been found.
Blocks World Problem (Sussman Anomaly)

- There is a flat surface on which block(labeled ‘A’,’B’ etc) can be placed.
- There are a number of square blocks, all the same size.
- They can be stacked one upon the other.
- There is robot arm that can manipulate the blocks.
- The start state and goal state are given.
Operations Performed the robot arm

- Robot arm can hold only one block at a time.
1. UNSTACK(A,B): Pick up block A from its current position on block B.
2. STACK(A,B): Place block A on block B.
3. PICKUP(A): Pick up block A from the table and hold it.
4. PUTDOWN(A): Put block A down on the table.
Predicates
- All the operations performed by the robot arm has certain pre-conditions that can be described in the form of predicates.
1. ON(A,B): Block A is on Block B.
2. ONTABLE(A): Block A is on the table.
3. CLEAR(A): There is nothing on the top of Block A.
4. HOLDING(A): The arm is holding Block A.
5. ARMEMPTY: The arm is holding nothing.
Robot problem Solving System (STRIPS)

- ADD List : List of new predicates that the operator causes to become true.
- DELETE List : List of old predicates that the operator causes to become false.
- PRECONDITIONS list : Predicates that must be true for the operator to be applied.
• STRIPS style operators for BLOCK World problem are
STACK (x, y) PICKUP(x)

P: CLEAR(y) Λ HOLDING(x) P: CLEAR(x) Λ ONTABLE(x) Λ ARMEMPTY
D: CLEAR(y) Λ HOLDING(x) D: ONTABLE(x) Λ ARMEMPTY
A: ARMEMPTY Λ ON (x, y) A: HOLDING(x)
UNSTACK (x, y) PUTDOWN (x)
Goal Stack Planning
- This is one of the most important planning algorithms, which is specifically used by STRIPS.
- In Goal Stack Planning we work backwards from the goal state to the initial state.
- We start at the goal state and we try fulfilling the preconditions required to achieve the initial state. These preconditions in
turn have their own set of preconditions, which are required to be satisfied first. We keep solving these “goals” and “sub-
goals” until we finally arrive at the Initial State.
- We make use of a stack to hold these goals that need to be fulfilled as well the actions that we need to perform for the
same.
- Apart from the “Initial State” and the “Goal State”, we maintain a “World State” configuration which is used by Goal Stack
to work its way from Goal State to Initial State. World State on the other hand starts off as the Initial State and ends up
being transformed into the Goal state.
- At the end of this algorithm we are left with an empty stack and a set of actions which helps us navigate from the Initial
State to the World State.
• Following list of actions can be applied to the various situations in the problem.
OPERATORS PRECONDITION DELETE ADD

STACK(A, B) CLEAR(B) Λ HOLDING (A) CLEAR(B) Λ HOLDING (A) ARMEMPTY Λ ON(A,B)
UNSTACK(A, B) ON(A,B) Λ CLEAR(A) Λ ARMEMPTY ON(A,B) Λ ARMEMPTY HOLDING (A) Λ CLEAR(B)
PICKUP(A) CLEAR(A) Λ ONTABLE(A) Λ ARMEMPTY ONTABLE(A) Λ ARMEMPTY HOLDING (A)
PUTDOWN(A) HOLDING (A) HOLDING (A) ONTABLE(A) Λ ARMEMPTY
Non-Linear Planning using Constraint Posting
- Most problems require an intertwined plan in which multiple sub-problems are worked on simultaneously.
- Such a plan is called nonlinear plan because it is not composed of a linear sequence of complete sub-plans.
o Advantage : It may be an optimal solution with respect to plan length
o Disadvantage: It takes larger search space, since all possible goal orderings are taken into consideration.
Constraint Posting
- The idea of constraint posting is to build up a plan by incrementally hypothesizing operators, partial orderings between
operators, and binding of variables within operators.
- At any given time in the problem-solving process, we may have a set of useful operators but perhaps no clear idea of how
those operators should order with respect to each other.
- A solution is a partially ordered, partially instantiated set of operators to generate an actual plan.
Algorithm For Non-linear Planning
1. Choose a goal 'g' from the goalset
2. If 'g' does not match the state, then
• Choose an operator 'o' whose add-list matches goal g
• Push 'o' on the opstack
• Add the preconditions of 'o' to the goalset
3. While all preconditions of operator on top of opstack are met in state
• Pop operator o from top of opstack
• state = apply(o, state)
• plan = [plan; o]
Constraint Posting VS State Space Search

- State Space Search
o Moves in the space: Modify world state via operator
o Model of time: Depth of node in search space
o Plan stored in Series of state transitions
- Constraint Posting Search
o Moves in the space: Add operators, Order Operators, Bind variables Or Otherwise constrain plan
o Model of Time: Partially ordered set of operators
o Plan stored in Single node
Hierarchical Planning
- In order to solve hard problems, a problem solver may have to generate long plans.
- But it is important to be able to eliminate some of the details of the problem until a solution that addresses the main
issues is found. And then an attempt can be made to fill appropriate details.
- To do this initially macro operators were used but in this approach no details were eliminated from actual description of
operators.
ABSTRIPS is better approach ,actually planned in a hierarchy of abstraction spaces, in each of which preconditions at a
lower level of abstraction are ignored.
• ABSTRIPS approach is as follows:

- First solve the problem completely, considering only preconditions whose criticality value is the highest possible.
- These values reflect the expected difficulty of satisfying the precondition.
- To do this, do exactly what STRIPS did, but simply ignore the preconditions of lower than peak criticality.
- Once this done, use the constructed plan as the outline of a complete plan and consider preconditions at the next-lowest
criticality level.
- Augment the plan with operators that satisfy those preconditions
• The assignment of appropriate criticality value is crucial to the success of this hierarchical planning method and Those
preconditions that no operator can satisfy are clearly the most critical.
• Example, solving a problem of moving robot, for applying an operator, PUSH-THROUGH DOOR, the precondition that there
exist a door big enough for the robot to get through is of high criticality.
Reactive Systems
- The idea of reactive systems is to avoid planning altogether.
- A reactive system is very different from the other kinds of planning system because it chooses actions one at
a time.
It does not anticipate and select an entire action sequence before it does the first thing.
- A reactive system must have an access to a knowledge base of some sort that describes what actions should
be taken under what circumstances.
- The example is a Thermostat. The job of the thermostat is to keep the temperature constant inside a room.
- Simple pair of situation-action rules used by Thermostat:
1. If the temperature in the room is k degrees above the desired temperature, then turn the AC on.
2. If the temperature in the room is k degrees below the desired temperature, then turn the AC off.
o Advantages Of Reactive System :

- They are capable of complex behaviors.
- They operate robustly in domains that are difficult to model completely and accurately.
- They are extremely responsive since they avoid the combinatorial explosion involved in deliberative
planning .This makes them attractive for real-time tasks such as driving and walking
Other Planning techniques

• Triangle tables :Provides a way of recording the goals that each operator expected to satisfy as well as the
goals that must be true for it to execute correctly.
• Meta-planning :A technique for reasoning not just about the problem solved but also about the planning
process itself.
• Macro-operators :Allow a planner to build new operators that represent commonly used sequences of
operators.
• Case-based planning: Re-uses old plans to make new ones.
What is NLP ?
-
Ch-8
Natural language processing (NLP) is a field of Artificial Intelligence in which computers analyze, understand,
and derive meaning from human language.
- The field focuses is about making computers understand and generate human language.
- NLP refers to communicating with an intelligent systems using a natural language such as English.
- There are the following two components of NLP:
NLU(Natural Language Understanding) NLG(Natural Language Generation)
NLU is the process of reading and interpreting NLG is the process of writing or generating language.
language.
It produces non-linguistic outputs from natural It produces constructing natural language outputs
language inputs. from non-linguistic inputs.
- Natural Language Processing (NLP) problem can be divided into two tasks:
1.Processing written text, using lexical, syntactic and semantic knowledge of the language as well as the required
real world information.
2.Processing spoken language, using all the information needed above plus additional knowledge about phonology
as well as enough added information to handle the further ambiguities that arise in speech.
Phases of NLP
1. Morphological Analysis:
- Individual words are analyzed into their components and non-word tokens such as punctuation are separated
from the words.
2. Syntactic Analysis:
- Linear sequences of words are transformed into structures that show how the words relate to each other.
- Some word sequences may be rejected if they violate the language’s
rule for how words may be combined.The sentence such as “The school
goes to boy” is rejected by English syntactic analyzer.
3. Semantic Analysis:
- The structures created by the syntactic analyzer are assigned meanings.
- A mapping is made between the syntactic structures and objects in the
task domain.
- Structures for which no such mapping is possible may be rejected. The
semantic analyzer disregards sentence such as “hot ice-cream”.
4. Discourse integration:
- The meaning of an individual sentence may depend on the sentences
that precede it and may influence the meanings of the sentences that
follow it.
5. Pragmatic Analysis:
- The structure representing what was said is reinterpreted to determine
what was actually meant.
1. Morphological Analysis:
- The morphological level involves identifying and analyzing the structure of words.
- Lexicon of a language means the collection of words and phrases in a language. It involves identifying and analyzing the
structure of words.
- Morphological analysis is dividing the whole chunk of text into paragraphs, sentences, and words.
- This process will usually assign syntactic categories to all the words in the sentence.
- Suppose there is a sentence “I want to print Bill’s .init file.”
- Morphological analysis must do the following things:
▪ Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s”
▪ Recognize the sequence “.init” as a file extension that is functioning as an adjective in the sentence.
2. Syntactic Analysis:
- Syntactic analysis must exploit the results of morphological analysis to build a structural description of the sentence.
- The goal of this process, called parsing, is to convert flat sentence into a hierarchical structure that corresponds to
meaning units .
- Reference markers (set of entities) are shown in the parenthesis in the parse tree.
- Each one corresponds to some entity that has been mentioned in the sentence.
- These reference markers are useful later since they provide a place in which to accumulate information about the entities
as we get it.
3. Semantic Analysis:
- Semantic analysis must do two important things:
a. It must map individual words into appropriate objects in the knowledge base.
b. It must create the correct structures to correspond to the way the meanings of the individual words combine with each
other.
- It draws the exact meaning or the dictionary meaning from the text. The text is checked for meaningfulness.
- It is done by mapping syntactic structures and objects in the task domain.
4. Discourse Integration:
- The discourse level of linguistic processing deals with the analysis of structure and meaning of text beyond a single
sentence, making connections between words and sentences.
- Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the sentences
that follow it
- At this level, Anaphora Resolution is also achieved by identifying the entity referenced by an anaphor
- Structured documents also benefit from the analysis at the discourse level since sections can be broken down into (1)
title, (2) abstract, (3) introduction, (4) body, (5) results, (6) analysis, (7) conclusion, and (8) references.
5. Pragmatic Analysis:
- During this, what was said is re-interpreted on what it actually meant
- It helps you to discover the intended effect by applying a set of rules that characterize cooperative dialogues.For
Example: "Open the door" is interpreted as a request instead of an order.
- It is the process to translate knowledge based representation to a command to be executed by the system.
- The pragmatic level of linguistic processing deals with the use of real-world knowledge and understanding of how this
impacts the meaning of what is being communicated.
Applications of NLP
1. Question Answering
- Question Answering focuses on building systems that automatically answer the questions asked by humans in a
natural language.Eg : Virtual Assistants like Siri, Alexa etc.
2. Text Classification
- Text classification is the process of categorizing the text into a group of words.
- By using NLP, text classification can automatically analyze text and then assign a set of predefined tags
- For e.g., Spam Detection is used to detect unwanted e-mails getting to a user's inbox.
3. Sentiment Analysis
- Sentiment Analysis is also known as opinion mining.
- It is used on the web to analyse the attitude, behaviour, and emotional state of the sender.
- This application is implemented through a combination of NLP and statistics by assigning the values to the text
(positive, negative, or natural), identify the mood of the context (happy, sad, angry, etc.)
4. Machine Translation
- Machine translation is used to translate text or speech from one natural language to another natural
language.Example: Google Translator
5. Spelling correction
- Microsoft Corporation provides word processor software like MS-word, PowerPoint for the spelling correction.
- Speech recognition is used for converting spoken words into text.
- It is used in applications, such as mobile, home automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on.
7. Chatbot
- It is used by many companies to provide the customer's chat services.
8. Information extraction
- Information extraction is one of the most important applications of NLP. It is used for extracting structured
information from unstructured or semi-structured machine-readable documents.
9. Natural Language Understanding (NLU)
- It converts a large set of text into more formal representations such as first-order logic structures that are easier
for the computer programs to manipulate notations of the natural language processing.
Spell Checking
- Spell Check is a process of detecting and sometimes providing suggestions for incorrectly spelled words in a text.
- A basic spell checker carries out the following processes:
o It scans the text and extracts the words contained in it.
o It then compares each word with a known list of correctly spelled words (i.e. a dictionary).
o An additional step is a language-dependent algorithm for handling morphology.
- Spelling errors can be divided as:

• Non-word errors : These are the most common type of errors. You either miss a few keystrokes or let your fingers
hurtle a bit longer. These are those error words that cannot be found in the dictionary.
• Real-word errors : Sometimes you end up creating a real word error such as E.g., typing flower when you meant
flour. These are those error words that are acceptable words in the dictionary.
• Cognitive Errors : Cognitive errors can occur due to the ignorance of a word or its correct spelling. The words
piece and peace are homophones (sound the same). So you are not sure which one is which.
• Short forms/Slang/Lingo : These are possibly not even spelling errors. You are trying hard to fit in everything
within a text message . Example using ‘bcz’ instead of “because”
- Error Detection
• Dictionary Lookup Technique checks every word of input text for its presence in dictionary. If that word present
in the dictionary then it is a correct word. Otherwise it is put into the list of error words.
How to build NLP Pipeline
1. Sentence Segmentation
- Sentence Segment is the first step for building the NLP pipeline. It breaks the paragraph into separate sentences.
2. Word Tokenization
- Word Tokenizer is used to break the sentence into separate words or tokens.
3. Stemming
- Stemming is used to normalize words into its base form or root form. The big problem with stemming is that
sometimes it produces the root word which may not have any meaning.
- For Example, intelligence, intelligent, and intelligently, all these words are originated with a single root word
"intelligen." In English, the word "intelligen" do not have any meaning.
4. Lemmatization
- Lemmatization is quite similar to the Stamming.. The main difference between Stemming and lemmatization is
that it produces the root word, which has a meaning.
- For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word intelligent,
which has a meaning.
5. Identifying Stop Words

- In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a". NLP pipelines will
flag these words as stop words. Stop words might be filtered out before doing any statistical analysis.
6. Dependency Parsing
- Dependency Parsing is used to find that how all the words in the sentence are related to each other.
7. POS tags
- POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a word
functions with its meaning as well as grammatically within the sentences.
- Example: Google something on the Internet.
Here Google is used as a verb, although it is a proper noun.
8. Named Entity Recognition (NER)

- NER is the process of detecting the named entity such as person name, organization name, or location etc.
- Example: Steve Jobs introduced iPhone .
9. Chunking
- Chunking is used to collect the individual piece of information and grouping them into bigger pieces of sentences.
Hopfield Network
-
Ch-9
Hopfield neural network is proposed by John Hopfield as a theory of memory: Model of content addressable.
- Hopfield network is a special kind of neural network whose response is different from other neural networks. It
is calculated by converging iterative process.
- A Hopfield network is at first prepared to store various patterns or memories and then it becomes ready to
recognize any of the learned patterns.
- Properties of Hopfield network:

o A recurrent network with all nodes connected to all other nodes.
o Nodes have binary outputs (either 0,1 or -1,1).
o Weights between the nodes are symmetric.
o No connection from a node to itself is allowed.
o Nodes are updated asynchronously
o The network has no hidden nodes or layer.
- The Hopfield network consists of a set of neurons and corresponding set of unit delays, forming a multiple loop
feedback system.
- Processing units are always in one of two states, active(Black) or inactive(White).
- Units are connected to each other with weighted symmetric connections. A positive weighted connection
indicates that the two units tend to activate each other. A negative weighted connection allows an active unit to
deactivate a neighboring unit.
- Sometimes the network cannot find global solution because the network sticks with the local minima.
- Boltzman machines is a technique to combine simulated annealing and Hopfield to find
Global minima.
- The network operates as follows (Parallel relaxation algorithm):

1. A random unit is chosen
2. If any of it neighbors are active, the unit computes the sum of the weights on the connections to
those active neighbors.
3. If the sum is positive, the unit becomes active, otherwise it becomes inactive.
4. Another random unit is chosen, and the process repeats until the network reaches a stable state.
(e.g. until no more unit can change state)
- The network has four distinct stable states.

- Hopfield’s major contribution was to show that given any set of weights and any initial state, the parallel
relaxation algorithm would eventually steer the network into one of these stable states.
- Thus, similar to the human brain the Hopfield model has stability in pattern recognition.
Artificial Neural Network
- To understand how an artificial neuron works, we should know how the biological neuron works
- In brain each neuron is typically connected to thousands of other neurons to form a neural network.
- Principal components of Neuron
o Dendrites : Receives input signals from other neurons and transmits it to the cell body
o Cell body: Information processing happens in a cell body.
o Synapse : Connection between the dendrite and the axons of other neurons are called synapses
o Axon: Transmits the output signal over the synapses to the dendrites of other neurons
- Artificial Neural networks are computing system which has interconnected nodes that work like neurons
present in the brain
- Simple ANN consists of input layer,output layer and in between them hidden layers .
• Inputs are fed simultaneously into the units making up the input layer
• They are then weighted and fed simultaneously to a hidden layer
• The weighted outputs of the last hidden layer are input to the output layer
• Output layer emits the network's prediction
Backpropagation is used to train ANN.

1. Initialize weight and biases in network
2. Propagate the input forward
by applying activation function
3. Backpropagate the error
by updating weights and biases
4. Terminating condition
Here modifications are made in the

“backwards” direction: from the output
layer, through each hidden layer down
to the first hidden layer hence it is called
“backpropagation”
Activation Functions
In ANN, we can also apply activation functions over the input to get the exact output.
1. Linear Activation Function
- It is also Known as identity Function as it performs no input editing.
F(x)=x
2. Binary sigmoidal function

- It performs input editing between 0 and 1. It is positive in nature. It is always bounded, which means its
output cannot be less than 0 and more than 1. It is also strictly increasing in nature.
3. Bipolar sigmoidal function

- It performs input editing between -1 and 1. It can be positive or negative in nature. It is always bounded, which
means its output cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid function.
4. Tanh(Tangent Hyperbolic) function

- It is more efficient than the sigmoid function. It is a mathematical updated version of the sigmoid function.
- It is non-linear, and its value range lies between -1 to +1
F(x)= tanh(x) = 2/(1+e-2X) – 1 = 2 * sigmoid(2x) - 1
- Types of Activation function that can be used

Learning in Neural Networks
- Basically, learning means to do and adapt the change in itself as and when there is a change in environment.
- ANN is a complex adaptive system in which processing unit is capable of changing its input/output behavior
due to the change in environment.
- During ANN learning, we need to adjust the weights to change the input/output behavior.
- The methods with the help of which weights can be modified are called Learning rules, which are simply
algorithms or equations.
- Following are some learning rules for the neural network :
o Hebbian learning rule
- It identifies, how to modify the weights of nodes of a network.
- It is based on the given Postulate “That the connections between two neurons might be strengthened if the
neurons fire at the same time and might weaken if they fire at different times.”
o Perceptron learning rule
- This rule states that network starts its learning by assigning a random value to each weight.
o Delta learning rule
- This rule states that the change in the weight of a node is equivalent to the product of error and the input
o Correlation learning rule
- It is based on the same principle as the Hebbian learning rule .But the correlation rule is the supervised
learning.
o Outstar learning rule
- We can use it when it assumes that or neurons in a network arranged in a layer.
Types of Learning in Neural Networks
1) Supervised Learning
- This type of learning is done under the supervision of a teacher. This learning process is dependent.
- During the training of ANN under supervised learning, the input vector is presented to the network, which will
give an output vector and this output vector is compared with the desired output vector.
- An error signal is generated, if there is a difference between the actual output and the desired output vector.
- On the basis of this error signal, the weights are adjusted until the actual output is matched with the desired
output.
2) Unsupervised Learning
- This type of learning is done without the supervision of a teacher. This learning process is independent.
- During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form
clusters. When a new input pattern is applied, then the neural network gives an output response indicating the
class to which the input pattern belongs.
- There is no feedback from the environment as to what should be the desired output and if it is correct or incorrect
3) Reinforcement learning
- In Reinforcement learning agents learn from their experiences only.
- It lies between supervised and unsupervised learning
- It is also called learning with critic
- Reinforcement learning works on a feedback-based process, in which an AI agent automatically explore its
surrounding by hitting & trail, taking action, learning from experiences, and improving its performance.
- Agent gets rewarded for each good action and get punished for each bad action.The main aim of
reinforcement learning agent is to maximize the rewards.
Perceptron Learning Algorithm

Perceptron is a basic ANN that consists only input layer and output layer and here output layer is the only
computation performing layer.
- Perceptron model works in 2 steps by :
1. Features of the model we want to train should be passed as input
2. Multiply all input values with their weights and adding these values together to create the weighted sum.
3. Adds the Bias value to the weighted sum move the output function away from the origin
4. Feed this computed value to the activation function 'f' to obtain the desired output
- Activation function plays a vital role in ensuring that output is mapped between required values (0,1) or (-1,1)
- The training of the perceptron means to find out a pattern of firing and non-firing neurons for the given input.
- Often it is good to include an x0 input which is the bias unit = 1.
- Perceptron performs the computations to output binary values 0 or 1

Activation = Weights * Inputs + Bias
Predict 1: If Activation > 0.0
Predict 0: If Activation <= 0.0
- Perceptron is a linear classification algorithm.

- A single perceptron is capable of outputting a linear equation in the form of a model, that separates two
classes using a line called a hyperplane in the feature space.
- Pattern classification problem: It is linearly separable because we can draw a line that separates one class
from another.
- Let us assume the output of the perceptron to be y, and let x be an
input to the perceptron with a single input.
- Then y = w1*x + b, where w1 is the weight for the input, and b is the
bias.Let us further simplify this problem by assuming w1 to be 1, and
b to be 0.
- This makes our equation to be y = x, which is the equation of the
straight line passing through the origin. As we can see the equation
can easily separate the two kinds of classes.
- A single perceptron fails to solve the problem which is linearly inseparable.

- So to solve a non-linear problem, we add multiple perceptrons to the network.
- One such problem is the XOR problem, which can be solved using multilayered perceptrons.
Applications of Neural network

- Speech occupies a prominent role in human-human interaction. Therefore, it is natural for people to expect
speech interfaces with computers.
- ANN is playing a major role in this area. Following ANNs have been used for speech recognition −
• Multilayer networks /Multilayer networks with recurrent connections
• Kohonen self-organizing feature map
- The most useful network for this is Kohonen Self-Organizing feature map, which has its input as short segments
of the speech waveform. It will map the same kind of phonemes as the extracted features which will help some
acoustic model to recognize the utterance.
2. Character Recognition
- It is an interesting problem which falls under the general area of Pattern Recognition.
- Many neural networks have been developed for automatic recognition of handwritten characters, either letters
or digits. Following ANNs have been used for character recognition −
• Backpropagation neural networks and Neocognitron
- Like back-propagation neural networks , neocognitron also has several hidden layers, the pattern of connection
from one layer to the next is localized and so training is done layer by layer.
3. Signature Verification Application

- Signatures are one of the most useful ways to authorize and authenticate a person in legal transactions.
- Signature verification technique is a non-vision based technique.
- For this application first of all feature set representing the signature is extracted. Using these feature sets the
neural networks can be trained using an efficient neural network algorithm. This trained neural network will
classify the signature as being genuine or forged .
4. Human Face Recognition

- It is one of the biometric methods to identify the given face.
- First, all the input images must be preprocessed, then the dimensionality of that image is reduced and at last it
must be classified using neural network training algorithm.
- Following neural networks are used for training purposes with preprocessed image −
• Fully-connected multilayer feed-forward neural network trained with the help of back-propagation
algorithm.
• For dimensionality reduction, Principal Component Analysis PCA is used.
5. Other Applications
- Artificial Neural Networks are used in Oncology to train algorithms that can identify cancerous tissue at the
microscopic level at the same accuracy as trained physicians.
- Object detection models such as YOLO (You Only Look Once) and SSD (Single Shot Object Detectors).
Recurrent networks
- They are feedback networks with closed loops.
Feedback network : It has feedback paths so the signal can flow in both directions using loops.
1. Fully recurrent network − It is the simplest neural network architecture because all nodes are connected to all
other nodes and each node works as both input and output.
2. Jordan network − It is a closed loop network in which the output will go to the input again as feedback .
o RNNs are a class of neural networks that are helpful in modeling sequential data like time series, speech,
financial data, audio, video, weather etc.
o RNNs are most promising algorithms in use because it is the only one with an internal memory.
o Because of their internal memory RNN can remember important things about the input they received.
o Recurrent networks can be trained with the Back-propagation algorithm.
• Difference in information flow between a RNN and a feed-forward neural network.
o In a feed-forward neural network, the information only moves in one direction and never touches a node
twice so it has no memory of the input they receive. Because it only considers the current input, it has no
notion of order in time.
o In an RNN the information cycles through a loop. When it makes a decision, it considers the current input and
also what it has learned from the inputs it received previously. So RNN has two inputs: the present and the
recent past.
• Feed-forward neural networks map one input to one output but RNNs can map one to many, many to
many and many to one.
Symbolic AI vs Connectionist AI
o Symbolic AI
- Search – state space traversal
- Knowledge representation – predicate logic, semantic frames, scripts
- Learning – macro-operators, explanation learning, version space
o Connectionist AI
- Search – Parallel Relaxation
- Knowledge representation – very large number of real value connection strength
- Learning – back propagation, reinforcement learning, unsupervised learning
- Symbolic AI represents information through symbols and their relationships.

- Symbolic AI works well with applications that have clear-cut rules and goals
- However Symbolic AI fails to solve following problems:
1. Common sense problem : Human has the ability to store large amount of data and get implicit and explicit
information. Symbolic AI fails to acquire implicit information
2. Expert System: It requires large amount of knowledge from experts so here symbolic AI fails.
3. Game Playing: When more than one player are involved Symbolic AI fails.
- Connectionist models focus more on learning than representation

- Learning in connectionist model generally involve tuning of the weights. The tuning is generally based on the
gradient descent or its approximation. For this purpose best known algorithm is backpropagation.
- Connectionist Learning Algorithms have been devised for:
1. Supervised Learning : Similar to Symbolic Algorithms for classification but resulting into trained network.
2. Unsupervised Learning : Similar to Symbolic Algorithms for clustering but without the use of explicit rules.
3. Reinforcement Learning : Either implementing Symbolic or Connectionist Methods.
Local Representation and Distributed Representation

- Connectionist networks can be divided into two classes:
1.Those that uses localist representations
• In localist representation, a single neuron represents a single concept on a stand-alone basis.
• With a local representation, activity in individual units can be interpreted directly
2.Those that use distributed representation
• Distributed representation is generally defined to have the following properties
I. A concept is represented by a pattern of activity over a collection of neurons (i.e., more than one
neuron is required to represent a concept).
II. Each neuron participates in the representation of more than one concept.
• With distributed representation individual units cannot be interpreted without knowing the state of other
units in the network.
- The fundamental difference between localist and distributed representation is only in the interpretation and
meaning of the units.
Characteristics of Distributed Representation
1. Representational efficiency
- With n binary output neurons, it can represent 2n concepts .
- With localist representation, n neurons can only represent n concepts.
2. Mapping efficiency
- It allows for a more compact overall structure from input nodes to the output ones and that means less number of
connections and weights to train.
3. Sparse distributed representation
- It is sparse if only a small fraction of the n neurons is used to represent a subset of the concepts.
4. Resiliency
- It is more resistant to damage.
- It is resilient in the sense that degradation of a few elements in the network structure may not disrupt or effect the
overall performance of the structure.
What is Expert System ?
-
Ch-10
The concept of expert systems was first developed by Feigenbaum .He explained that the world was moving
from data processing to knowledge processing, a transition which was being enabled by new processor
technology and computer architectures.
- An expert system is a computer program that is designed to solve complex problems and to provide decision-
making ability like a human expert.
Features:
- Human experts are perishable, but an expert system is permanent.
- One expert system may contain knowledge from more than one human experts thus making the
solutions more efficient.
- It decreases the cost of consulting an expert .
- Expert systems can solve complex problems by deducing new facts through existing facts of knowledge
- Expert systems were among the first truly successful forms of artificial intelligence (AI) software.
Limitations :
- Do not have human-like decision-making power.
- Cannot produce correct result from less amount of knowledge.
- Requires excessive training
Expert System Shells (Components of Expert System)

The components of ES are :
- Knowledge Base
- Inference Engine
- User Interface
1. Knowledge Base
- The knowledge base represents facts and rules.
- It consists of knowledge in a particular domain as well as rules to solve a problem, procedures and intrinsic
data relevant to the domain
- The knowledge base of an ES is a store of both, factual and heuristic knowledge.
a. Factual Knowledge − It is the information widely collected by the Knowledge Engineers in the task domain.
b. Heuristic Knowledge − It is about practice, judgement, one’s ability of evaluation, and guessing.
- An expert system solves the most complex issue as an expert by extracting the knowledge stored in its
knowledge base. This knowledge is extracted from its knowledge base using the reasoning and inference rules
according to the user queries.
- The performance of an expert system is based on the expert's knowledge stored in its knowledge base. The
more knowledge stored in the knowledge base better the performance of expert system.
b. Inference Engine
- The function of the inference engine is to fetch the relevant knowledge from the knowledge base, interpret it
and to find a solution relevant to the user’s problem.
- The inference engine acquires the rules from its knowledge base and applies them to the known facts to infer
new facts.
- Inference engines can also include an explanation and debugging abilities.
c. User Interface
- This module makes it possible for a non-expert user to interact with the expert system and find a solution to
the problem.
Representation using Domain Knowledge
- Expert system is built around a knowledge base module.
- It contains a formal representation of the information provided by the domain expert. This information may be
in the form of problem-solving rules, procedures, or data intrinsic to the domain.
- To incorporate these information into the system, it is necessary to make use of one or more knowledge
representation methods. Three common methods of knowledge representation evolved over the years are, IF-
THEN rules, Semantic networks and Frames.
- Transferring knowledge from the human expert to a computer is often the most difficult part of building an
expert system.
Knowledge Acquisition (Knowledge Acquisition and learning module)
- The function of this component is to allow the expert system to acquire more and more knowledge from
various sources and store it in the knowledge base.
- The success of any expert system majorly depends on the quality, completeness, and accuracy of the information
stored in the knowledge base.
- The knowledge base is formed by readings from various experts and Knowledge Engineers. The knowledge
engineer is a person who acquires information from subject and then he categorizes and organizes the
information in a meaningful way.
- The knowledge engineer also monitors the development of the ES.
The Inference Engine generally uses two strategies for acquiring knowledge from the Knowledge Base:
1. Forward Chaining
- Forward Chaining is a strategic process used by the Expert System to answer the question – What will happen next.
- This strategy is mostly used for managing tasks like creating a conclusion, result or effect.
- Example : prediction or share market movement status.
2. Backward Chaining
- Backward Chaining is a storage used by the Expert System to answer the questions – Why this has happened.
- This strategy is mostly used to find out the root cause or reason behind it, considering what has already happened.
- Example: diagnosis of stomach pain, blood cancer etc.
Applications of Expert System
Application Description
Design Domain Camera lens design, automobile design.
Diagnosis Systems to deduce cause of disease from observed data, conduction medical
Medical Domain
operations on humans.
Comparing data continuously with observed system such as leakage monitoring in long
Monitoring Systems
petroleum pipeline.
Process Control Systems Controlling a physical process based on monitoring.
Knowledge Domain Finding out faults in vehicles, computers.
Finance/Commerce Detection of possible fraud, stock market trading, Airline scheduling, cargo scheduling.
Explanation of Expert System with Example
- MYCIN is a well-known medical expert system that was developed at Stanford University.
- MYCIN was designed to assist doctors to prescribe antimicrobial drugs for blood infections. So indirectly using ES
,experts in antimicrobial drugs can assist the doctors who are not so expert in that field.
- By asking the doctor a series of questions, MYCIN is able to recommend a course of treatment for the patient.
- MYCIN is also able to explain to the doctor the rules that are fired and therefore is able to explain why it
produced the diagnosis and recommended treatment that it.
- MYCIN has been proven to be able to provide more accurate diagnoses of meningitis in patients than most
doctors.
- MYCIN was developed using LISP(List Processing), and its rules are expressed as LISP expressions.
- Example of the kind of rule used by MYCIN, translated into English:
o IF the infection is primary-bacteria.
o AND the site of the culture is one of the sterile sites.
o AND the suspected portal of entry is the gastrointestinal tract.
o THEN there is suggestive evidence (0.7) that infection is bacteroid.
- A common method for building this expert systems is to use a rule-based system with backward chaining.
- Typically, a user enters a set of facts into the system, and the system tries to see if it can prove any of the
possible hypotheses using these facts.
- Typically, backward chaining is used in combination with forward chaining. Whenever a new fact is added to the
database, forward chaining is applied to see if any further facts can be derived and then Backward chaining is
used to prove each possible hypothesis.
-
Introduction To Genetic Algorithms
Genetic Algorithms are inspired by Charles Darwin’s theory of Evolution.
Ch-11
In Darwin’s theory the three main principles necessary for evolution to happen are :
1) Heredity — There must be a process in by which children receive the traits of their parent
2) Variation — There must be a variety of traits present in the population or a means to introduce a variation
3) Selection — There must be a mechanism by which some members of the population can be parents and pass
down their genetic information and some do not (survival for the fittest).
- They simulate the process of natural selection which means those species who can adapt to changes in their
environment are able to survive and reproduce and go to next generation. In simple words, they simulate
“survival of the fittest”
- Genetic Algorithms(GAs) are adaptive heuristic search algorithms that are part of evolutionary algorithms.
- They are an intelligent exploitation of a random search.
- Although randomized, Genetic Algorithms are by no means random.
- They are commonly used to generate high-quality solutions for optimization problems and search problems.
- Each generation consist of a population of individuals and each individual represents a point in search space and
possible solution. Each individual is represented as a string of character/integer/float/bits. This string is
analogous to the Chromosomes.
• Individual - Any possible solution
• Population - Group of all individuals
• Search Space - All possible solutions to the problem
• Chromosome - Blueprint for an individual
• Trait – Features of an individual
• Genome - Collection of all chromosomes for an individual
- GA’s are based on following Analogy

1. Individual in population compete for resources and mate
2. Those individuals who are fittest then mate to create more offspring than others.
3. Genes from “fittest” parent propagate throughout the generation, that is sometimes parents create
offspring which is better than either parent.
4. Thus each successive generation is more suited for their environment
Genetic Algorithm
1) Randomly initialize populations p

2) Determine fitness score for the population.
3) Until convergence repeat:
a) Select parents from population
b) Crossover and generate new population
c) Perform mutation on new population
d) Calculate fitness for new population
1. Initialization
- Randomly generate a population with multiple chromosomes.
- Gene is the smallest unit and it can be referred as a set of characteristics.
- Chromosome is composed of various Genes.
- Population constitue of number of chromosomes.
2. Defining the Fitness Function

- The fitness function determines how likely an individual is fit to be selected for reproduction, and this is based on its fitness
score.
- A Fitness Score is given to each individual which shows the ability of an individual to “compete”.
- For eg: fitness function can be the sum of all the genes, hence chromosome with the maximum sum is the fittest.
3. Selection
- Two fittest Chromosomes are selected for creating the next generation and other chromosomes are dropped.
- These pair of chromosomes will act as parents to generate offspring for the next generation.
- Some methods for parent Selection are:
o Tournament selection ,Roulette wheel selection ,Proportionate selection ,Rank selection ,Steady state selection etc.
4. Crossover
- This represents mating between individuals. It is equivalent to two parents having a child.
- Two Chromosomes are selected using selection operator and crossover sites are chosen randomly.
- Then the genes at these crossover sites are exchanged thus creating a completely new individual (offspring).
Single Point crossover
Two Point crossover
Uniform crossover
5. Mutation
- Mutation is applied to each child individually after the crossover .
- To avoid duplicity(crossover generates offspring that are similar to parents) and to enhance the diversity in offspring one
can perform mutation
- Mutation randomly alters some features in the offspring.
Swap Mutation
Scramble Mutation
Inversion Mutation
The offspring population created by selection, crossover (recombination), and mutation replaces the original parent
population.
6. Termination parameters for GA

- The GA algorithm terminates when
1. either maximum number of generations has been produced.
2. or a satisfactory fitness level has been reached for the population.
3. or variation in the individuals from one generation to the next reaches pre-specified level of stability.
4. or Objective fn value has reached a certain predefined value.
5. or any Combination of the above.
• If the algorithm has terminated due to a maximum number of generations, a satisfactory solution mayor
may not have been reached.
Significance of Genetic Operators

- There are three main types of operators used in genetic algorithms which must work in conjunction with one
another in order for the algorithm to be successful.
- Genetic operators are used to maintain genetic diversity (mutation operator), combine existing solutions into
new solutions (crossover) and select between solutions (selection).
1. Selection
- The idea is to give preference to the individuals with good fitness scores and allow them to pass their genes to
successive generations.
2. Crossover
- Crossover ensures that offspring possess characteristics similar to both the parents.
- If no crossover is performed then the offspring would be exact copies of the parents with no improvements or
variations.
- Thus, crossover is an attempt to create better or fitter chromosomes from the existing good ones.
3. Mutation
- Mutation facilitates a sudden change in a gene within a chromosome, generating a solution that is dimensionally
far away from or dissimilar to those in the current pool.
- If the mutant has better fitness then it will be taken up for the next offspring generation.
- If the mutant has a lesser fitness value then it will gradually fade out as selection operator will ensure not using
it for offspring generation.

Defining AI and exploring key concepts

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Defining AI and exploring key concepts

Uploaded by

Copyright:

Available Formats

Defn of AI

According to the father of Artificial Intelligence, John McCarthy, it is

Problems addressed by AI search Algorithms falls into 3 classes

Three broad categories of AI are : Three broad categories of ML are :

State and State space Representation

Define the problems as a state space search

Components of Production system

1. Global Database (Working memory)

Production System Characteristics

Classes of Production System

Issues in the design of search programs

Uninformed Search techniques

o Advantages: Finds the shortest path to the goal.

II. Depth First search (DFS)

o Advantages: Less space complexity as compared to than BFS.

o Advantages : Depth-limited search is Memory efficient.

Informed Search techniques

State space Diagram for Hill Climbing

Problems in Hill climbing algorithm

- At temperature, t, the probability of an increase in energy of magnitude :𝑃(𝛿𝐸) = 𝑒𝑥𝑝(−𝛿𝐸 /𝑘𝑡)

c. Best first Search

03. A* (Non-greedy best first search)

Step1: Place the starting node in the OPEN list.

1. Finite branching factor. And if a graph is finite.

Overestimated heuristic Underestimated heuristic

In Overestimated heuristic, it is not necessary In Underestimated heuristic, A* is optimal.

Problem Reduction (Problem Reduction Using AO*)

Constraint satisfaction Problem (CSP)

Problem characteristic SatisfiedReason

Problem characteristic Satisfied Reason

5. Tower of Hanoi Problem

Problem characteristic Satisfied Reason

6. Missionaries and cannibals

Problem characteristic Satisfied Reason

1. A Water Jug Problem

o Initial State and Goal State

o Production Rules One of the Possible Solution

3. Travelling Salesman Problem

4. Tower of Hanoi Problem

5. Missionaries and Cannibals

- Intelligent agents should have capacity for:

- Knowledge representation languages should have precise syntax and semantics.

Representation and Mappings

MAPPING BETWEEN FACTS AND REPRESENTATION

Properties of Knowledge Representation System

There are mainly four approaches to knowledge representation

1. Simple Relational knowledge :

Algorithm: Property Inheritance

Representation of Simple facts in logic

2. Compound propositions : Compound propositions are constructed by combining atomic propositions.

Universal quantification Existential quantification

Facts Represented as Well Formed Formula in FOPL

1. All people who are graduating are happy.

Step 1: Facts into FOL

o Rename or Standardize variables

o Skolemization: Eliminate existential instantiation quantifier

o Drop Universal Quantifiers

o Distribute conjunction ^ over disjunction ¬ (Here Not needed)

Step 3: Negate the statement that needs to be proved : ¬smile(w)

Step 4: Draw the resolution graph

Example: A car has four tires. - Declarative

Rule Based Expert System