You are on page 1of 135

ARTIFICIAL INTELLIGENCE

UNIT -2

LOGIC CONCEPT, LOGIC PROGRAMMING AND KNOWLEDGE EPRESENTATION

Logic Concepts and Logic Programming: Introduction, Propositional Calculus- Propositional


Logic- Natural Deduction System- Axiomatic System, Tableau System in Propositional Logic
Resolution in Propositional Logic- Predicate Logic- Logic Programming.

Knowledge Representation: Introduction- Approaches to Knowledge Representation-


Knowledge Representation using Semantic Network- Extended Semantic Networks for KR-
Knowledge Representation using Frames.

Knowledge Representation:

o Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence
which concerned with AI agents thinking and how thinking contributes to intelligent
behavior of agents.
o It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve the complex real world problems such
as diagnosis a medical condition or communicating with humans in natural language.

Logical Agents:

Agent with some representation of complex knowledge about the world (or) its
environment and uses inference to derive new information from the knowledge combined with
new inputs.

Knowledge Base:

Set of sentences in a formal language representing facts about the world.

Knowledge based agent:

 Intelligent agents need knowledge about the world to choose good actions (or)
decisions.
 Knowledge= {sentences} in a knowledge representation language (formal
language).
 A knowledge based agent is composed of
i) Knowledge base: domain specific content
ii) Inference mechanism: domain independent algorithm.
 The agent must be
- Represent states, actions, etc..
- Incorporate new percepts.
- Update internal representation of the world.
- Reduce hidden properties of the world.
- Reduce appropriate actions.

Inference engine Domain independent


algorithm

Knowledge base Domain specific content

Approaches to designing a knowledge-based agent:

There are mainly two approaches to build a knowledge-based agent:


1. Declarative approach: We can create a knowledge-based agent by initializing with an
empty knowledge base and telling the agent all the sentences with which we want to start
with. This approach is called Declarative approach.
2. Procedural approach: In the procedural approach, we directly encode desired behavior
as a program code. Which means we just need to write a program that already encodes
the desired behavior or agent.

Techniques of Knowledge Representation:

There are mainly four ways of knowledge representation,

1. Logical Representation
2. Semantic Network Representation
3. Frame Representation
4. Production Rules

a) Logical Representation: Logical representation is a language with some concrete


rules which deals with propositions and has no ambiguity in representation.
It consists of precisely defined syntax and semantics which supports the
sound inference. Each sentence can be translated into logics using syntax and
semantics.

Syntax: defines well-formed sentence in the logic.

Semantic: defines the truth (or) meaning of sentence in a world.

Logical representation can be categorized into mainly two logics:

a. Propositional Logics
b. Predicate logics
Advantages of logical representation:

1. Logical representation enables us to do logical reasoning.


2. Logical representation is the basis for the programming languages.

Disadvantages of logical Representation:

1. Logical representations have some restrictions and are challenging to work with.
2. Logical representation technique may not be very natural, and inference may not be so
efficient.

1. Propositional Logic:
Propositional logic (PL) is the simplest form of logic where all the statements are
made by propositions. A proposition is a declarative statement which is either true
or false. It is a technique of knowledge representation in logical and mathematical
form.

Syntax of propositional logic:

The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:

a. Atomic Propositions
b. Compound propositions

o Atomic Proposition: Atomic propositions are the simple propositions. It consists of a


single proposition symbol. These are the sentences which must be either true or false.

Example:
1. 2+2 is 4, it is an atomic proposition as it is a true fact.
2. b) "The Sun is cold" is also a proposition as it is a false fact.
o Compound proposition: Compound propositions are constructed by combining simpler
or atomic propositions, using parenthesis and logical connectives.

Example:

a) "It is raining today, and street is wet."


b) "Ankit is a doctor, and his clinic is in Mumbai."

Logical Connectives:

Logical connectives are used to connect two simpler propositions or representing a sentence
logically. We can create compound propositions with the help of logical connectives. There are
mainly five connectives, which are given as follows:

Truth Table:

In propositional logic, we need to know the truth values of propositions in all possible scenarios.
We can combine all the possible combination with logical connectives, and the representation of
these combinations in a tabular format is called Truth table. Following are the truth table for all
logical connectives:
Limitations of Propositional logic:

o We cannot represent relations like ALL, some, or none with propositional logic.
Example:
a. All the girls are intelligent.
b. Some apples are sweet.
o Propositional logic has limited expressive power.
o In propositional logic, we cannot describe statements in terms of their properties or
logical relationships.

2.First Order Logic: (FOL)


- FOL is another way of knowledge representation in AI. It is an extension to
Propositional logic.
- FOL is also known as Predicate logic. It is powerful language that develops information
about the objects in a more easy way and can also express the relationship between those
objects.
- FOL does not only assume that the world contains facts Propositional logic but also
assumes,

o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus,


etc..
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any
relation such as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, etc..

- As a natural language, first-order logic also has two main parts:

d. Syntax
e. Semantics
Syntax of First-Order logic:

The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols.

Atomic sentences:

o Atomic sentences are the most basic sentences of first-order logic. These sentences are
formed from a predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).

Example: Kumar and Sai are brothers: => Brothers(Kumar, Sai).


Jerry is a cat: => cat (Jerry).

Complex Sentences:

o Complex sentences are made by combining atomic sentences using connectives.

First-order logic statements can be divided into two parts:

o Subject: Subject is the main part of the statement.


o Predicate: A predicate can be defined as a relation, which binds two atoms together in a
statement.

Consider the statement:

"x is an integer.",

It consists of two parts,

- The first part x is the subject of the statement and


- Second part "is an integer," is known as a predicate.

Quantifiers in First-order logic:


o A quantifier is a language element which generates quantification, and quantification
specifies the quantity of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of the
variable in the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).

Universal Quantifier:

Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.

The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.

Example:

All man drink coffee.


Let x is a variable.

X1 drink coffee ^ x2 drink coffee

^ xn drink coffee.

∀x man(x) → drink (x, coffee).

There are all x where x is a man who drink coffee.

Existential Quantifier:

Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.

It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.

Example:

Some boys are intelligent.

∃x: boys(x) ∧ intelligent(x)

It will be read as: There are some x where x is a boy who is intelligent.

Some Examples of FOL using quantifier:

1. All birds fly.

predicate is "fly(bird)."

∀x bird(x) →fly(x).

2.Every man respects his parent.

predicate is "respect(x, y)," where x=man, and y= parent.

∀x man(x) → respects (x, parent).

3. Not all students like both Mathematics and Science.


predicate is "like(x, y)," where x= student, and y= subject.

¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].

b) Semantic Network Representation


 Semantic networks are alternative of predicate logic for knowledge representation. In
Semantic networks, we can represent our knowledge in the form of graphical networks.
 This network consists of nodes representing objects and arcs which describe the
relationship between those objects.
 Semantic networks can categorize the object in different forms and can also link those
objects. Semantic networks are easy to understand and can be easily extended.

This representation consists of mainly two types of relations:

- IS-A relation (Inheritance)


- Kind-of-relation

Example: Following are some statements which we need to represent in the form of nodes and
arcs.

Statements:

i. Jerry is a cat.
ii. Jerry is a mammal
iii. Jerry is owned by Priya.
iv. Jerry is brown colored.
v. All Mammals are animal.
Drawbacks in Semantic representation:

1. Semantic networks take more computational time at runtime as we need to traverse the
complete network tree to answer some questions.
2. These types of representations are inadequate as they do not have any equivalent
quantifier, e.g., for all, for some, none, etc.
3. Semantic networks do not have any standard definition for the link names.

Advantages of Semantic network:

1. Semantic networks are a natural representation of knowledge.


2. Semantic networks convey meaning in a transparent manner.
3. These networks are simple and easily understandable.

c) Frame Representation:
- A frame is a record like structure which consists of a collection of attributes and its
values to describe an entity in the world.
- Frames are the AI data structure which divides knowledge into substructures by
representing stereotypes situations.
- It consists of a collection of slots and slot values. These slots may be of any type and
sizes. Slots have names and values which are called facets.

Example:

Let's take an example of a frame for a book

Advantages of frame representation:

1. The frame knowledge representation makes the programming easier by grouping the
related data.
2. It is very easy to add slots for new attribute and relations.
3. It is easy to include default data and to search for missing values.
4. Frame representation is easy to understand and visualize.

Disadvantages of frame representation:

1. In frame system inference mechanism is not be easily processed.


2. Inference mechanism cannot be smoothly proceeded by frame representation.

d) Production rukes:
Production rules system consist of (condition, action) pairs which mean, "If condition then
action". It has mainly three parts:

o The set of production rules


o Working Memory
o The recognize-act-cycle

In production rules agent checks for the condition and if the condition exists then production rule
fires and corresponding action is carried out. The condition part of the rule determines which rule
may be applied to a problem.

The working memory contains the description of the current state of problems-solving and rule
can write knowledge to the working memory. This knowledge match and may fire other rules.

Example:

o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the bus).

Advantages of Production rule:

1. The production rules are expressed in natural language.


2. The production rules are highly modular, so we can easily remove, add or modify an
individual rule.

Disadvantages of Production rule:

1. Production rule system does not exhibit any learning capabilities, as it does not store the
result of the problem for the future uses.
2. During the execution of the program, many rules may be active hence rule-based production
systems are inefficient.
UNIT - 3
An expert system is a computer program that is designed to solve complex
problems and to provide decision-making ability like a human expert. It
performs this by extracting knowledge from its knowledge base using the
reasoning and inference rules according to the user queries.

The expert system is a part of AI, and the first ES was developed in the year
1970, which was the first successful approach of artificial intelligence. It solves
the most complex issue as an expert by extracting the knowledge stored in its
knowledge base. The system helps in decision making for complex problems
using both facts and heuristics like a human expert. It is called so because it
contains the expert knowledge of a specific domain and can solve any complex
problem of that particular domain. These systems are designed for a specific
domain, such as medicine, science, etc.

The performance of an expert system is based on the expert's knowledge


stored in its knowledge base. The more knowledge stored in the KB, the more
that system improves its performance. One of the common examples of an ES
is a suggestion of spelling errors while typing in the Google search box.

Below is the block diagram that represents the working of an expert system:

Note: It is important to remember that an expert system is not used to replace


the human experts; instead, it is used to assist the human in making a complex
decision. These systems do not have human capabilities of thinking and work
on the basis of the knowledge base of the particular domain.
Below are some popular examples of the Expert System:

o DENDRAL: It was an artificial intelligence project that was made as a


chemical analysis expert system. It was used in organic chemistry to
detect unknown organic molecules with the help of their mass spectra
and knowledge base of chemistry.
o MYCIN: It was one of the earliest backward chaining expert systems that
was designed to find the bacteria causing infections like bacteraemia
and meningitis. It was also used for the recommendation of antibiotics
and the diagnosis of blood clotting diseases.
o PXDES: It is an expert system that is used to determine the type and
level of lung cancer. To determine the disease, it takes a picture from
the upper body, which looks like the shadow. This shadow identifies the
type and degree of harm.
o CaDeT: The CaDet expert system is a diagnostic support system that can
detect cancer at early stages.

Characteristics of Expert System

o High Performance: The expert system provides high performance for


solving any type of complex problem of a specific domain with high
efficiency and accuracy.
o Understandable: It responds in a way that can be easily understandable
by the user. It can take input in human language and provides the output
in the same way.
o Reliable: It is much reliable for generating an efficient and accurate
output.
o Highly responsive: ES provides the result for any complex query within a
very short period of time.

Components of Expert System

An expert system mainly consists of three components:

o User Interface
o Inference Engine
o Knowledge Base
1. User Interface

With the help of a user interface, the expert system interacts with the user,
takes queries as an input in a readable format, and passes it to the inference
engine. After getting the response from the inference engine, it displays the
output to the user. In other words, it is an interface that helps a non-expert
user to communicate with the expert system to find a solution.

2. Inference Engine(Rules of Engine)

o The inference engine is known as the brain of the expert system as it is


the main processing unit of the system. It applies inference rules to the
knowledge base to derive a conclusion or deduce new information. It
helps in deriving an error-free solution of queries asked by the user.
o With the help of an inference engine, the system extracts the knowledge
from the knowledge base.
o There are two types of inference engine:
o Deterministic Inference engine: The conclusions drawn from this type of
inference engine are assumed to be true. It is based on facts and rules.
o Probabilistic Inference engine: This type of inference engine contains
uncertainty in conclusions, and based on the probability.
Inference engine uses the below modes to derive the solutions:

o Forward Chaining: It starts from the known facts and rules, and applies
the inference rules to add their conclusion to the known facts.
o Backward Chaining: It is a backward reasoning method that starts from
the goal and works backward to prove the known facts.

3. Knowledge Base

o The knowledgebase is a type of storage that stores knowledge acquired


from the different experts of the particular domain. It is considered as
big storage of knowledge. The more the knowledge base, the more
precise will be the Expert System.
o It is similar to a database that contains information and rules of a
particular domain or subject.
o One can also view the knowledge base as collections of objects and their
attributes. Such as a Lion is an object and its attributes are it is a
mammal, it is not a domestic animal, etc.

Components of Knowledge Base

o Factual Knowledge: The knowledge which is based on facts and


accepted by knowledge engineers comes under factual knowledge.
o Heuristic Knowledge: This knowledge is based on practice, the ability to
guess, evaluation, and experiences.

Knowledge Representation: It is used to formalize the knowledge stored in the


knowledge base using the If-else rules.

Knowledge Acquisitions: It is the process of extracting, organizing, and


structuring the domain knowledge, specifying the rules to acquire the
knowledge from various experts, and store that knowledge into the knowledge
base.

Development of Expert System

Here, we will explain the working of an expert system by taking an example of


MYCIN ES. Below are some steps to build an MYCIN:
o Firstly, ES should be fed with expert knowledge. In the case of MYCIN,
human experts specialized in the medical field of bacterial infection,
provide information about the causes, symptoms, and other knowledge
in that domain.
o The KB of the MYCIN is updated successfully. In order to test it, the
doctor provides a new problem to it. The problem is to identify the
presence of the bacteria by inputting the details of a patient, including
the symptoms, current condition, and medical history.
o The ES will need a questionnaire to be filled by the patient to know the
general information about the patient, such as gender, age, etc.
o Now the system has collected all the information, so it will find the
solution for the problem by applying if-then rules using the inference
engine and using the facts stored within the KB.
o In the end, it will provide a response to the patient by using the user
interface.

Participants in the development of Expert System

There are three primary participants in the building of Expert System:

1. Expert: The success of an ES much depends on the knowledge provided


by human experts. These experts are those persons who are specialized
in that specific domain.
2. Knowledge Engineer: Knowledge engineer is the person who gathers the
knowledge from the domain experts and then codifies that knowledge
to the system according to the formalism.
3. End-User: This is a particular person or a group of people who may not
be experts, and working on the expert system needs the solution or
advice for his queries, which are complex.
Why Expert System?

Before using any technology, we must have an idea about why to use that
technology and hence the same for the ES. Although we have human experts in
every field, then what is the need to develop a computer-based system. So
below are the points that are describing the need of the ES:

1. No memory Limitations: It can store as much data as required and can


memorize it at the time of its application. But for human experts, there
are some limitations to memorize all things at every time.
2. High Efficiency: If the knowledge base is updated with the correct
knowledge, then it provides a highly efficient output, which may not be
possible for a human.
3. Expertise in a domain: There are lots of human experts in each domain,
and they all have different skills, different experiences, and different
skills, so it is not easy to get a final output for the query. But if we put
the knowledge gained from human experts into the expert system, then
it provides an efficient output by mixing all the facts and knowledge
4. Not affected by emotions: These systems are not affected by human
emotions such as fatigue, anger, depression, anxiety, etc.. Hence the
performance remains constant.
5. High security: These systems provide high security to resolve any query.
6. Considers all the facts: To respond to any query, it checks and considers
all the available facts and provides the result accordingly. But it is
possible that a human expert may not consider some facts due to any
reason.
7. Regular updates improve the performance: If there is an issue in the
result provided by the expert systems, we can improve the performance
of the system by updating the knowledge base.

Capabilities of the Expert System

Below are some capabilities of an Expert System:

o Advising: It is capable of advising the human being for the query of any
domain from the particular ES.
o Provide decision-making capabilities: It provides the capability of
decision making in any domain, such as for making any financial
decision, decisions in medical science, etc.
o Demonstrate a device: It is capable of demonstrating any new products
such as its features, specifications, how to use that product, etc.
o Problem-solving: It has problem-solving capabilities.
o Explaining a problem: It is also capable of providing a detailed
description of an input problem.
o Interpreting the input: It is capable of interpreting the input given by the
user.
o Predicting results: It can be used for the prediction of a result.
o Diagnosis: An ES designed for the medical field is capable of diagnosing a
disease without using multiple components as it already contains various
inbuilt medical tools.

Advantages of Expert System

o These systems are highly reproducible.


o They can be used for risky places where the human presence is not safe.
o Error possibilities are less if the KB contains correct knowledge.
o The performance of these systems remains steady as it is not affected by
emotions, tension, or fatigue.
o They provide a very high speed to respond to a particular query.

Limitations of Expert System

o The response of the expert system may get wrong if the knowledge base
contains the wrong information.
o Like a human being, it cannot produce a creative output for different
scenarios.
o Its maintenance and development costs are very high.
o Knowledge acquisition for designing is much difficult.
o For each domain, we require a specific ES, which is one of the big
limitations.
o It cannot learn from itself and hence requires manual updates.

Applications of Expert System

o Indesigning and manufacturing domain


It can be broadly used for designing and manufacturing physical devices
such as camera lenses and automobiles.
o In the knowledge domain
These systems are primarily used for publishing the relevant knowledge
to the users. The two popular ES used for this domain is an advisor and a
tax advisor.
o In the finance domain
In the finance industries, it is used to detect any type of possible fraud,
suspicious activity, and advise bankers that if they should provide loans
for business or not.
o In the diagnosis and troubleshooting of devices
In medical diagnosis, the ES system is used, and it was the first area
where these systems were used.
o Planning and Scheduling
The expert systems can also be used for planning and scheduling some
particular tasks for achieving the goal of that task.

Traditional System Vs Expert System

The traditional system solves common numerical problems. while


expert system is fixed the problem in a relatively limited area.
Traditional system :

 The traditional system solves common numerical problems.

 It is a sequential program that combines information and processing.

 A well-tested program will never make a mistake.

 There is no explanation given for the output.

Expert system :

 It fixes the problem in a relatively limited area.

 The knowledge base may or may not be separated from the processing.

 It is possible that the well-tested expert system will make blunders.

 In most cases, an explanation is given.

 A key distinction between the traditional system as opposed to the


expert system is the way in which the problem related expertise is
coded. Essentially, in conventional applications, the problem expertise is
encoded in both program as well as data structures. On the other hand,
in expert systems, the approach of the problem related expertise is
encoded in data structures only. Moreover, the use of knowledge in
expert systems is vital. However, traditional systems use data more
efficiently than the expert system.
 One of the biggest limitations of conventional systems is that they are
not capable of providing explanations for the conclusion of a problem.
That is because these systems try to solve problems in a straightforward
manner. However, expert systems are capable of not only providing
explanations but also simplifying the understanding of a particular
conclusion.
 Generally, an expert system uses symbolic representations to perform
computations. On the contrary, conventional systems are incapable of
expressing these terms. They only simplify the problems without being
able to answer the “how” and “why” questions. Moreover, the problem-
solving tools are present in expert systems as opposed to the traditional
ones, and hence, various types of problems are most often entirely
solved by the experts of the system.

 Human System Vs Expert System

Human Experts Expert Systems

Perishable and unpredictable in


Permanent and consistent in nature
nature

Difficult to transfer and document


Easy to transfer and document data
data

Human expert resources are Expert Systems are cost effective


expensive Systems

Steps to Develop an Expert System:


Step1: Identification: Determining the characteristics of the problem.
Step2: Conceptualization: Finding the concept to produce the solution.
Step3: Formalization: Designing structures to organize the knowledge.
Step4: Implementation: Formulating rules which embody the knowledge.
Step5: Testing: Validating the rules
In practice, it may not be possible to break down the expert system
development cycle precisely. However, an examination of these five stages
may serve to provide us with some insight into the ways in which expert
systems are developed.

Truth Maintenance System:


TMS is a “Truth Maintenance System” which implements to permit a form of
non-monotonic reasoning by permitting the addition of changing statements
to a knowledge base. It also say as “Belief-Revision” or “Revision Management
Systems“.
Truth maintenance systems (TMS) are also called reason maintenance systems.
They are used as a means to solve problems in the domain of Artificial
Intelligence when using rule-based inference systems. A TMS is used to build
and manage the dependency network that an inference engine uses to solve
problems. There are many goals that a TMS needs to satisfy. IT has to have the
ability to justify the conclusions it reaches. It has to recognize the
inconsistencies in the result and determine the cause. It has to remember all of
the derivations that it had previously computed and more. The premise,
contradiction, and assumption architecture form to be the fundamental entity
in the working of a truth maintenance system.
Role of Truth Maintenance System:
1. The main job is TMS is to maintain “consistency of knowledge”
being used by the problem solver and not to perform any
inference functions.
2. TMS also gives the inference – component, the latitude to
perform non-monotonic inferences.
3. When discoveries made, this more recent information can
displace previous conclusions that are no longer valid.
4. Actually, the TMS doesn’t discard conclusions like Q as suggested.
That could be wasteful, since P may again become valid, which
would require that Q and facts justified by Q be received. Instead,
the TMS maintains dependency records for all such conclusion.
These records determine which set of beliefs are current. Thus, Q
removed from the current belief set by making appropriate
updates to the records and not by erasing Q. Since Q would not be
lost, its rederivation would not be necessary if P become valid
once again.
5. TMS maintains complete records of reasons of justification for
beliefs. Each proposition or statement having at least one valid
justification make a part of the current belief set. The Statements
are lacking acceptable justifications exclude from this set. When a
contradiction discovers, the statements responsible for the
contradiction identifies and an appropriate one retracts. This in
turn may result in other retractions and additions. The procedure
uses to perform this process that says “Dependency-Directed
Back-tracking“.
6. TMS maintains records to reflect retractions and additions so that
the IE will always know its current belief set. The records maintain
in the form of a “Dependency-Network“.
i. The nodes in the network represent KB entries such as premises,
conclusions, inference rules and the like.
ii. Attached to the nodes are justifications that represent the
inference steps from which the node derived.

Expert System Shell Structure

The Expert System Shell refers to a software module containing an:

1. User interface (built-in)

2. Inference engine (built-in)

3. A structured skeleton of a knowledge base (in its empty state) with


the suitable knowledge representation facilities

Not only the above components but also some ES Shells provide facilities for
database connectivity through interpreter, web integration, and natural
language processing (NLP) features.
What is Uncertainty in AI?
When we talk about perceiving information from the environment, then the
main problem that arises is that there is always some uncertainty is our
observations. This is because the world is an enormous entity and the
surroundings that we take under study is not always well defined. So, there
needs to be an estimation taken for getting to any conclusion.

Human being face this uncertainty daily that too many times. But still, they
manage to take successful decisions. This is because humans have strong
estimating and decision making power and their brains function in such a way
that every time such a situation arises, the alternative with the maximum
positive output is chosen. But, the artificial agents are not able to take proper
decisions while working in such an environment. This is because, if the
information available to them is not accurate, then they cannot choose the
right decision from their knowledge base.
Uncertainty Example
Taking a real life example, while buying vegetables, humans can easily
distinguish between the different kinds of vegetables by their color, sizes,
textures, etc. But there is uncertainty in making the right choices here,
because the vegetables may not be exactly the same as described. Some of
them may be distorted, some may vary than the usual size and there can be
many such variations. But in spite of all of them, humans do not face any
problem in situations like these.

But, this same thing becomes a hurdle when the decision is to be made by a
computer based agent. Because, we can feed the agent by the information
that how the vegetables look, but still we cannot accurately define the exact
shape and size of each of them, because all of them have some variations.

So, as a solution to this, only the basic information is provided to the agent.
Based on this very information, the agent has to make certain estimates to find
out which vegetable is kept in front of it.

Not only in this agent, but in designing almost every AI based agent, this
strategy is followed. So, there should be a proper method so that the agent
can make certain estimations by itself without any help or input from human
beings.

Reasons for Uncertainty in Artificial Intelligence

In this tutorial, we will learn about the various reasons which are
responsible for uncertainty in the decisions made either by humans or
computer-based agents. We will study them all in detail.By Monika
Sharma Last updated : April 15, 2023

As we already know that uncertainty arises when we are not 100 percent sure
about the outcome of the decisions. This mostly happens in those cases where
the conditions are neither completely true nor completely false.

When talking about Artificial Intelligence, an agent faces uncertainty in


decision making when it tries to perceive the environment for information.
Because of this, the agent gets wrong or incomplete data which can affect the
results drawn by the agent. This uncertainty is faced by the agents due to the
following reasons:
Reasons for Uncertainty in Artificial Intelligence
The following are the reasons for uncertainty in Artificial Intelligence:

1. Partially observable environment


The entire environment is not always in reach of the agent. There are some
parts of the environment which are out of the reach of the agent and hence
they are left unobserved. So, the decisions that the agent makes do not
include the information from these areas and hence, the result drawn may vary
form the actual case.

2. Dynamic Environment
As we all know that the environment is dynamic, i.e. there are always some
changes that keep taking place in the environment. So, the decision or
calculations made at any instant may not be the same after some time due to
the changes that have occurred in the surroundings by that time. So, if the
observations made at any instance are considered later, then there can be an
ambiguity in the decision making.

3. Incomplete knowledge of the agent


If the agent has incomplete knowledge or insufficient knowledge about
anything, then it cannot produce correct results because the agent itself does
not know about the situation and the way in which the situation is to be
handled.

4. Inaccessible areas in the environment


There are areas in the environment which are observable, but not in reach of
the agent to access. In such situations. The observation made is correct, but
the as an agent cannot act on these parts of the environment, these parts will
remain unchanged by the actions of the agent. This will not affect the current
decision but can affect the estimations made by the agent in the future.
Probabilistic Reasoning in AI - A way to deal with Uncertainty

As we know that there are many cases where the answer to the problem is
neither completely true nor completely false. For example, the statement-
"Student will pass in the board exams". We cannot say anything about a
student's result before the results are declared. However, we can draw some
predictions based on the student's past performances in academics.

Probabilistic Reasoning
In these types of situations, probabilistic theory can help us give an estimate
of how much an event is likely to occur or happen? In this theory, we find the
probabilities of all the alternatives that are possible in any experiment. The
sum of all these probabilities for an experiment is always 1 because all these
events/alternatives can happen only within this experiment.

Example
As in the above example, the statement can either be true or false, not
anything other than that. That means, the student will either pass in board
exams or will fail. So, if we are given the following probability:

P (Student will pass in board exams) = 0.80


Therefore, P (Student will fail in board exams) = 0.20

Then this means that there are 80 percent chances that the student will pass
and 20 percent chances that the student will fail. And as we can observe that,
the probability that one of these events will occur is 100 percent.

Therefore, in all those cases where there is a fixed number of outcomes


possible for any given experiment, the probabilistic theory is applicable.

Another example in this theory can be taken of picking a card from a deck
(Excluding the Joker). If the stated events in this experiment are as follows:

A: The chosen card is of Spade


B: The chosen card is of Hearts
C: The chosen card is of Clubs
D: The chosen card is of Diamond

Then the probability of each event is:


P(A) = P(B) = P(C) = P(D) = 0.25

AS there are 13 cards of each of them in a deck. And the probability of the
events to occur when the experiment is taking place successfully is:

P(E)= P(A) + P(B) + P(C) + P(D)= 1

Before learning what conditional probability is? We must first learn about
dependent and independent events.

What are Independent and Dependent Events?


Independent events are those events which neither cause any effect nor are
affected by the occurrence of some other event. Whereas, dependent
events are affected by the happening of some other events which may occur
simultaneously or have occurred before it.

Example
Picking a card from a complete fresh deck of cards is an independent event
because the occurrence of any card is not affected by anything. But, suppose
after picking up a card, another card is picked up without replacing the
previous one, then this event becomes a dependent event because the
occurrence of any card is now affected by the previously drawn card.

What is Conditional Probability in AI?


The conditional probability is associated with these dependent events. If the
probability of any event is affected by the occurrence of other events (s), then
it is known as conditional probability.

How to Calculate Conditional Probability?


The conditional probability is calculated by P(A|B) and read as probability of
event A when event B has already occurred. It can be calculated as follows:

Formula
P(A|B) = P(A ^ B) / P(B)
Similarly, P(B|A) represents the probability of event B when A has already
occurred and this can also be calculated in a similar manner:

P(B|A) = P(A ^ B) / P(A)

Talking about the intelligent systems (agents), most of the cases that the
agent confronts are of conditional probability as the events taking place in
the environment are mostly dependent on one another. So, in the probabilistic
learning method, the concept of conditional probability is used the most by
the system. For example, an agent which tells the weather forecast will
determine the weather conditions based upon the other factors such as wind
speed, current temperature, and humidity level, etc. So, the future weather
conditions are dependent upon these factors and thus it is a dependent event.
In this case, also, the agent uses the concept of conditional probability but
with other complex concepts and calculations included with it.

What is Bayes Theorem in AI?


Bayes theorem is a method to find the probability of an event whose
occurrence is dependent on some other event’s occurrence. In simple words,
using the Bayes theorem, we can find the conditional probability of any
event.

The Bayes Theorem, also known as Bayes law or Bayes equation is a


mathematical equation which is given as follows:

Where, A and B are events and P(B) ≠ 0.

Here,

 P(A|B): Conditional probability of occurrence of event A when


event B has already occurred.
 P(B|A): Conditional probability of occurrence of event B when
event A has already occurred.
 P(A): Probability of occurrence of event A alone without any
dependence on other events.
 P(B): Probability of occurrence of event B alone without any
dependence on other events.

Derivation of Bayes Theorem

Similarly,

Putting the value of P (A^B) in equation (1), we get

Which is our required Bayes equation.

It should be noted that in the Bayesian equation, we need not find the
probability of both the events occurring simultaneously, i.e. P(A^B). We can
simply calculate the conditional probability of an event if we know the
conditional probability of the event on which it is dependent and the
individual probabilities of both the events without any dependency on each
other.

Bayes Theorem is applicable only in those experiments where we have only


two events. It is not applicable to the cases where the number of events is
more than two.

Conclusion
Nowadays, Bayes Theorem is used in many areas, and we can find its
applications in various fields. For example, in chemical engineering, Bayes
theorem is used to predict the drug concentrations in a body, it is also used to
anticipate the viability of generated H2 by the electrocatalyst made for
hydrogen evolution process. It is also used to find the estimates that how
likely a person is prone to cancer depending upon his/her age. Apart from
these examples, the Bayes rule is widely used and this theorem has proved to
be a very efficient method to find the conditional probabilities.

Bayesian Belief Network in artificial intelligence


Bayesian belief network is key computer technology for dealing with
probabilistic events and to solve a problem which has uncertainty. We can
define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which represents a set


of variables and their conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network,


or Bayesian model.

Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.

Real world applications are probabilistic in nature, and to represent the


relationship between multiple events, we need a Bayesian network. It can also
be used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making
under uncertainty.
Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:

o Directed Acyclic Graph


o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links),


where:

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in the graph.
These links represent that one node directly influence the other node,
and if there is no directed link that means that nodes are independent
with each other
o In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by
a directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.

Note: The Bayesian network graph does not contain any cyclic graph. Hence, it
is known as a directed acyclic graph or DAG.

The Bayesian network has mainly two components:

o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on
that node.

Bayesian network is based on Joint probability distribution and conditional


probability. So let's first understand the joint probability distribution:

Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

Overview
As we all know that when analyzing a situation and drawing certain results
about it in the real world, we cannot be cent percent sure about our
conclusions. There is some uncertainty in it for sure. We as human beings have
the capability of deciding whether the statement is true or false according to
how much certain we are about our observations. But machines do not have
this analyzing power. So, there needs to be some method to quantize this
estimate of certainty or uncertainty in any decision made. To implement this
method, the certainty factor was introduced for systems which work on
Artificial Intelligence.

Certainty Factor in AI
The Certainty Factor (CF) is a numeric value which tells us about how likely
an event or a statement is supposed to be true. It is somewhat similar to what
we define in probability, but the difference in it is that an agent after finding
the probability of any event to occur cannot decide what to do. Based on the
probability and other knowledge that the agent has, this certainty factor is
decided through which the agent can decide whether to declare the statement
true or false.

The value of the Certainty factor lies between -1.0 to +1.0, where the
negative 1.0 value suggests that the statement can never be true in any
situation, and the positive 1.0 value defines that the statement can never be
false. The value of the Certainty factor after analyzing any situation will either
be a positive or a negative value lying between this range. The value 0
suggests that the agent has no information about the event or the situation.

A minimum Certainty factor is decided for every case through which the
agent decides whether the statement is true or false. This minimum Certainty
factor is also known as the threshold value. For example, if the
minimum certainty factor (threshold value) is 0.4, then if the value of CF is
less than this value, then the agent claims that particular statement false.

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a


directed acyclic graph:

Example: Harry installed a new burglar alarm at his home to detect burglary.
The alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always
calls Harry when he hears the alarm, but sometimes he got confused with the
phone ringing and calls at that time too. On the other hand, Sophia likes to
listen to high music, so sometimes she misses to hear the alarm. Here we
would like to compute the probability of Burglary Alarm.

Problem:

Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the
Harry.

Solution:
o The Bayesian network for the above problem is given below. The
network structure is showing that burglary and earthquake is the parent
node of the alarm and directly affecting the probability of alarm's going
off, but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly
perceive the burglary and also do not notice the minor earthquake, and
they also not confer before calling.
o The conditional distributions for each node are given as conditional
probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains
2K probabilities. Hence, if there are two parents, then CPT will contain 4
probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D,
S, A, B, E], can rewrite the above probability statement using joint probability
distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


Let's take the observed probability for the Burglary and earthquake
component:

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04


False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability
of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent
Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in
the form of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.

Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.

The semantics of Bayesian Network:


There are two ways to understand the semantics of the Bayesian network,
which is given below:

1. To understand the network as the representation of the Joint probability


distribution.

It is helpful to understand how to construct the network.

2. To understand the network as an encoding of a collection of conditional


independence statements.

It is helpful in designing inference procedure.

Dempster-Shafer Theory was given by Arthur P. Dempster in 1967 and his student
Glenn Shafer in 1976. This theory was released because of the following reason:-
 Bayesian theory is only concerned about single evidence.
 Bayesian probability cannot describe ignorance.
DST is an evidence theory, it combines all possible outcomes of the problem. Hence
it is used to solve problems where there may be a chance that a piece of different
evidence will lead to some different result.
The uncertainty in this model is given by:-
1. Consider all possible outcomes.
2. Belief will lead to belief in some possibility by bringing out some
evidence. (What is this supposed to mean?)
3. Plausibility will make evidence compatible with possible outcomes.
Example: Let us consider a room where four people are present, A, B, C, and D.
Suddenly the lights go out and when the lights come back, B has been stabbed in the
back by a knife, leading to his death. No one came into the room and no one left the
room. We know that B has not committed suicide. Now we have to find out who the
murderer is.
To solve these there are the following possibilities:
 Either {A} or {C} or {D} has killed him.
 Either {A, C} or {C, D} or {A, D} have killed him.
 Or the three of them have killed him i.e; {A, C, D}
 None of them have killed him {o} (let’s say).
There will be possible evidence by which we can find the murderer by the measure
of plausibility.
Using the above example we can say:
Set of possible conclusion (P): {p1, p2….pn}
where P is a set of possible conclusions and cannot be exhaustive, i.e. at least one (p)
I must be true.
(p)I must be mutually exclusive.
Power Set will contain 2 n elements where n is the number of elements in the possible
set.
For eg:-
If P = { a, b, c}, then Power set is given as
{o, {a}, {b}, {c}, {a, d}, {d ,c}, {a, c}, {a, c ,d }}= 23 elements.
Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is
evidence for {K or B} which cannot be divided among more specific beliefs for K
and B.
Belief in K: The belief in element K of Power Set is the sum of masses of the
element which are subsets of K. This can be explained through an example
Lets say K = {a, d, c}
Bel(K) = m(a) + m(d) + m(c) + m(a, d) + m(a, c) + m(d, c) + m(a, d, c)
Plausibility in K: It is the sum of masses of the set that intersects with K.
i.e; Pl(K) = m(a) + m(d) + m(c) + m(a, d) + m(d, c) + m(a, c) + m(a, d, c)
Characteristics of Dempster Shafer Theory:
 It will ignorance part such that the probability of all events aggregate
to 1. (What is this supposed to mean?)
 Ignorance is reduced in this theory by adding more and more evidence.
 Combination rule is used to combine various types of possibilities.
Advantages:
 As we add more information, the uncertainty interval reduces.
 DST has a much lower level of ignorance.
 Diagnose hierarchies can be represented using this.
 Person dealing with such problems is free to think about evidence.
Disadvantages:
 In this, computation effort is high, as we have to deal with 2n sets
UNIT - IV
Machine-Learning Paradigms: Introduction. Machine Learning Systems.
Supervised and Unsupervised Learning. Inductive Learning. Learning Decision
Trees Deductive Learning. Clustering, Support Vector Machines.
Artificial Neural Networks: Introduction, Artificial Neural Networks, Single-
Layer Feed-Forward Networks, Multi-Layer Feed-Forward Networks, Radial-
Basis Function Networks, Design Issues of Artificial Neural Networks, Recurrent
Networks.

Introduction to Machine Learning:


Machine learning is a growing technology which enables computers to learn
automatically from past data. Machine learning uses various algorithms
for building mathematical models and making predictions using historical
data or information. Currently, it is being used for various tasks such as image
recognition, speech recognition, email filtering, Facebook auto-
tagging, recommender system, and many more.
Machine Learning is said as a subset of artificial intelligence that is mainly
concerned with the development of algorithms which allow a computer to
learn from the data and past experiences on their own. The term machine
learning was first introduced by Arthur Samuel in 1959.
Machine learning enables a machine to automatically learn from data,
improve performance from experiences, and predict things without being
explicitly programmed.

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the
huge amount of the data.
The importance of machine learning can be easily understood by its uses
cases, Currently, machine learning is used in self-driving cars, cyber
fraud detection, face recognition, and friend suggestion by Facebook,
etc. Various top companies such as Netflix and Amazon have build
machine learning models that are using a vast amount of data to analyze
the user interest and recommend product accordingly.
Classification of Machine Learning:
At a broad level, machine learning can be classified into three types:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
Supervised Machine Learning

Supervised learning is the types of machine learning in which machines are


trained using well "labelled" training data, and on basis of that data, machines
predict the output. The labelled data means some input data is already tagged
with the correct output.

Supervised learning is a process of providing input data as well as correct


output data to the machine learning model. The aim of a supervised learning
algorithm is to find a mapping function to map the input variable(x) with the
output variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset, where the
model learns about each type of data. Once the training process is completed,
the model is tested on the basis of test data (a subset of the training set), and
then it predicts the output.
The working of Supervised learning can be easily understood by the below
example and diagram:

Suppose we have a dataset of different types of shapes which includes square,


rectangle, triangle, and Polygon. Now the first step is that we need to train the
model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will
be labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the
model is to identify the shape.

The machine is already trained on all types of shapes, and when it finds a new
shape, it classifies the shape on the bases of a number of sides, and predicts
the output.

Steps Involved in Supervised Learning:


o First Determine the type of training dataset

o Collect/Gather the labelled training data.

o Split the training dataset into training dataset, test dataset, and
validation dataset.
o Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need
validation sets as the control parameters, which are the subset of
training datasets.
o Evaluate the accuracy of the model by providing the test set. If the
model predicts the correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

1. Regression

Regression algorithms are used if there is a relationship between the input


variable and the output variable. It is used for the prediction of continuous
variables, such as Weather forecasting, Market Trends, etc. Below are some
popular Regression algorithms which come under supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical,


which means there are two classes such as Yes-No, Male-Female, True-false,
etc.

Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:


o With the help of supervised learning, the model can predict the output
on the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of
objects.
o Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the complex
tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.

o In supervised learning, we need enough knowledge about the classes of


object.

Unsupervised Machine Learning

unsupervised learning is a machine learning technique in which models are not


supervised using training dataset. Instead, models itself find the hidden
patterns and insights from the given data. It can be compared to learning
which takes place in the human brain while learning new things. It can be
defined as:

Unsupervised learning is a type of machine learning in which models are


trained using unlabeled dataset and are allowed to act on that data without
any supervision.

Unsupervised learning cannot be directly applied to a regression or


classification problem because unlike supervised learning, we have the input
data but no corresponding output data. The goal of unsupervised learning is
to find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format.

Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised
Learning:

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by
their own experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data
which make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not


categorized and corresponding outputs are also not given. Now, this unlabeled
input data is fed to the machine learning model in order to train it. Firstly, it
will interpret the raw data to find the hidden patterns from the data and then
will apply suitable algorithms such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects
into groups according to the similarities and difference between the objects.
Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types
of problems:

o Clustering: Clustering is a method of grouping the objects into clusters


such that objects with most similarities remains into a group and has less
or no similarities with the objects of another group. Cluster analysis finds
the commonalities between the data objects and categorizes them as
per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method
which is used for finding the relationships between variables in the large
database. It determines the set of items that occurs together in the
dataset. Association rule makes marketing strategy more effective. Such
as people who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained Unsupervised learning algorithms are


using labeled data. trained using unlabeled data.

Supervised learning model takes direct Unsupervised learning model does not
feedback to check if it is predicting correct take any feedback.
output or not.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.

In supervised learning, input data is In unsupervised learning, only input data


provided to the model along with the is provided to the model.
output.

The goal of supervised learning is to train The goal of unsupervised learning is to


the model so that it can predict the output find the hidden patterns and useful
when it is given new data. insights from the unknown dataset.

Supervised learning needs supervision to Unsupervised learning does not need any
train the model. supervision to train the model.

Supervised learning can be categorized Unsupervised Learning can be classified


in Classification and Regression problems. in Clustering and Associations problems.

Supervised learning can be used for those Unsupervised learning can be used for
cases where we know the input as well as those cases where we have only input
corresponding outputs. data and no corresponding output data.

Supervised learning model produces an Unsupervised learning model may give


accurate result. less accurate result as compared to
supervised learning.

Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train true Artificial Intelligence as it learns
the model for each data, and then only it similarly as a child learns daily routine
can predict the correct output. things by his experiences.

It includes various algorithms such as Linear It includes various algorithms such as


Regression, Logistic Regression, Support Clustering, KNN, and Apriori algorithm.
Vector Machine, Multi-class Classification,
Decision tree, Bayesian Logic, etc.
Advantages of Unsupervised Learning
o Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have
labeled input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning


o Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
o The result of the unsupervised learning algorithm might be less accurate
as input data is not labeled, and algorithms do not know the exact
output in advance.

1. Reinforcement Learning:
Reinforcement learning is a feedback-based learning method, in which a
learning agent gets a reward for each right action and gets a penalty for each
wrong action. The agent learns automatically with these feedbacks and
improves its performance. In reinforcement learning, the agent interacts with
the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an
example of Reinforcement learning.
Machine learning is making our day to day life easy from self-driving
cars to Amazon virtual assistant "Alexa".
Applications of Machine learning:
Machine learning is a buzzword for today's technology, and it is growing very
rapidly day by day. We are using machine learning in our daily life even
without knowing it such as Google Maps, Google assistant, Alexa, etc. Below
are some most trending real-world applications of Machine Learning:
 Image Recognition
 Automatic Language Translation
 Medical Diagnosis
 Stock Market Trading
 Online Fraud Detection
 Virtual Personal Assistant
 Email Spam and Malware Filtering
 Self Driving Cars
 Product Recommendation
 Traffic Prediction
 Speech Recognition

Image Recognition:Image recognition is one of the most common applications


of machine learning. It is used to identify objects, persons, places, digital
images, etc. The popular use case of image recognition and face detection
is, Automatic friend tagging suggestion.
Speech Recognition:Speech recognition is a process of converting voice
instructions into text, and it is also known as "Speech to text", or "Computer
speech recognition." At present, machine learning algorithms are widely used
by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice
instructions.
Traffic prediction:It predicts the traffic conditions such as whether traffic is
cleared, slow-moving, or heavily congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Self-driving cars:One of the most exciting applications of machine learning is
self-driving cars. Machine learning plays a significant role in self-driving cars.
Tesla, the most popular car manufacturing company is working on self-driving
car. It is using unsupervised learning method to train the car models to detect
people and objects while driving.
Email Spam and Malware Filtering:Whenever we receive a new email, it is
filtered automatically as important, normal, and spam. We always receive an
important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning. Below are some
spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision
tree, and Naïve Bayes classifier are used for email spam filtering and malware
detection.
Medical Diagnosis:In medical science, machine learning is used for diseases
diagnoses. With this, medical technology is growing very fast and able to build
3D models that can predict the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.

Decision Tree
o Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred
for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the
outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the
given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-like
structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
Decision Tree Terminologies
Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more homogeneous
sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the
tree.
Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
Example:Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the
decision tree starts with the root node (Salary attribute by ASM). The root
node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer).

Attribute Selection Measures


While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes. So, to solve such
problems there is a technique which is called as Attribute selection measure
or ASM. By this measurement, we can easily select the best attribute for the
nodes of the tree. There are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
Information Gain:Information gain is the measurement of changes in entropy
after the segmentation of a dataset based on an attribute.
It calculates how much information a feature provides us about a class.
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
Gini Index:
o Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to
the high Gini index.
Advantages of the Decision Tree
o It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of the Decision Tree
o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
For more class labels, the computational complexity of the decision tree may
increase
Inductive Learning
An technique of machine learning called inductive learning trains a model to
generate predictions based on examples or observations. During inductive
learning, the model picks up knowledge from particular examples or instances
and generalizes it such that it can predict outcomes for brand-new data.
When using inductive learning, a rule or method is not explicitly programmed
into the model. Instead, the model is trained to spot trends and connections in
the input data and then utilize this knowledge to predict outcomes from fresh
data. Making a model that can precisely anticipate the result of subsequent
instances is the aim of inductive learning.
In supervised learning situations, where the model is trained using labeled
data, inductive learning is frequently utilized. A series of samples with the
proper output labels are used to train the model. The model then creates a
mapping between the input data and the output data using this training data.
The output for fresh instances may be predicted using the model after it has
been trained.
Inductive learning is used by a number of well-known machine learning
algorithms, such as decision trees, k-nearest neighbors, and neural networks.
Because it enables the development of models that can accurately anticipate
new data, even when the underlying patterns and relationships are
complicated and poorly understood, inductive learning is an essential method
for machine learning.
Advantages
 Because inductive learning models are flexible and adaptive, they
are well suited for handling difficult, complex, and dynamic
information.
 Finding hidden patterns and relationships in data: Inductive
learning models are ideally suited for tasks like pattern recognition
and classification because they can identify links and patterns in
data that may not be immediately apparent to humans.
 Huge datasets − Inductive learning models are suitable for
applications requiring the processing of massive quantities of data
because they can efficiently handle enormous volumes of data.
 Appropriate for situations where the rules are ambiguous − Since
inductive learning models may learn from examples without
explicit programming, they are suitable for situations when the
rules are not precisely described or understood beforehand.
Disadvantages
 May overfit to particular data − Inductive learning models that
have overfit to specific training data, or that have learned the
noise in the data rather than the underlying patterns, may perform
badly on fresh data.
 computationally costly possible − The employment of inductive
learning models in real-time applications may be constrained by
their computationally costly nature, especially for complex
datasets.
 Limited interpretability − Inductive learning models may be
difficult to understand, making it difficult to understand how they
arrive at their predictions, in applications where the decision-
making process must be transparent and explicable.
 Inductive learning models are only as good as the data they are
trained on, therefore if the data is inaccurate or inadequate, the
model may not perform effectively.
Deductive Learning
Deductive learning is a method of machine learning in which a model is built
using a series of logical principles and steps. In deductive learning, the model
is specifically designed to adhere to a set of guidelines and processes in order
to produce predictions based on brand-new, unexplored data.
In rule-based systems, expert systems, and knowledge-based systems, where
the rules and processes are clearly set by domain experts, deductive learning
is frequently utilized. The model is trained to adhere to the guidelines and
processes in order to derive judgments or predictions from the input data.
Deductive learning begins with a set of rules and processes and utilizes these
rules to generate predictions on incoming data, in contrast to inductive
learning, which learns from particular examples. Making a model that can
precisely adhere to a set of guidelines and processes in order to generate
predictions is the aim of deductive learning.
Deductive learning is used by a number of well-known machine learning
algorithms, such as decision trees, rule-based systems, and expert systems.
Deductive learning is a crucial machine learning strategy because it enables
the development of models that can generate precise predictions in
accordance with predetermined rules and guidelines.
Advantages
More effective − Since deductive learning begins with broad
concepts and applies them to particular cases, it is frequently
quicker than inductive learning.
 Deductive learning can sometimes yield more accurate findings
than inductive learning since it starts with certain principles and
applies them to the data.
 Deductive learning is more practical when data are sparse or
challenging to collect since it requires fewer data than inductive
learning.
Disadvantages
 Deductive learning is constrained by the rules that are currently in
place, which may be insufficient or obsolete.
 Deductive learning is not appropriate for complicated issues that
lack precise rules or correlations between variables, nor is it
appropriate for ambiguous problems.
 Results that are biased − The quality of the rules and knowledge
base, which might add biases and mistakes to the results,
determines how accurate deductive learning is.
The Main Distinctions Between Inductive and Deductive Learning in Machine
Learning are Outlined in the Following Table
Inductive Learning Deductive Learning

Approach Bottom-up Top-down

Data Specific examples Logical rules and procedures

Model Find correlations and patterns obey clearly stated guidelines and
Creation in data. instructions

Training Adapting model parameters Programming explicitly and


and learning from instances establishing rules

Goal Using fresh data, generalizing, Make a model that precisely


and making predictions. complies with the given guidelines
and instructions.
Examples Decision trees, neural Knowledge-based systems, expert
networks, clustering systems, and rule-based systems
algorithms

Strengths capable of learning from a accurately when according to


variety of complicated data, established norms and processes,
adaptable, and versatile and effective when doing specific
duties

Limitations It may be difficult to manage limited to well-defined duties and


complex and diverse data and norms, possibly incapable of
may overfit to specific facts. adjusting to novel circumstances

Clustering in Machine Learning

Clustering or cluster analysis is a machine learning technique, which groups the


unlabelled dataset. It can be defined as "A way of grouping the data points
into different clusters, consisting of similar data points. The objects with the
possible similarities remain in a group that has less or no similarities with
another group."

It does it by finding some similar patterns in the unlabelled dataset such as


shape, size, color, behavior, etc., and divides them as per the presence and
absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the


algorithm, and it deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with
a cluster-ID. ML system can use this id to simplify the processing of large and
complex datasets.

The clustering technique is commonly used for statistical data analysis.

The below diagram explains the working of the clustering algorithm. We can
see the different fruits are divided into several groups with similar properties.
Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to
another group also). But there are also other various approaches of Clustering
exist. Below are the main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is


also known as the centroid-based method. The most common example of
partitioning clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to
define the number of pre-defined groups. The cluster center is created in such
a way that the distance between the data points of one cluster is minimum as
compared to another cluster centroid.
Density-Based Clustering

The density-based clustering method connects the highly-dense areas into


clusters, and the arbitrarily shaped distributions are formed as long as the
dense region can be connected. This algorithm does it by identifying different
clusters in the dataset and connects the areas of high densities into clusters.
The dense areas in data space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset
has varying densities and high dimensions.

Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based


on the probability of how a dataset belongs to a particular distribution. The
grouping is done by assuming some distributions commonly Gaussian
Distribution.
The example of this type is the Expectation-Maximization Clustering
algorithm that uses Gaussian Mixture Models (GMM).

Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned


clustering as there is no requirement of pre-specifying the number of clusters
to be created. In this technique, the dataset is divided into clusters to create a
tree-like structure, which is also called a dendrogram. The observations or any
number of clusters can be selected by cutting the tree at the correct level. The
most common example of this method is the Agglomerative Hierarchical
algorithm.

Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may belong to
more than one group or cluster. Each dataset has a set of membership
coefficients, which depend on the degree of membership to be in a
cluster. Fuzzy C-means algorithm is the example of this type of clustering; it is
sometimes also known as the Fuzzy k-means algorithm.
Clustering Algorithms

The Clustering algorithms can be divided based on their models that are
explained above. There are different types of clustering algorithms published,
but only a few are commonly used. The clustering algorithm is based on the
kind of data that we are using. Such as, some algorithms need to guess the
number of clusters in the given dataset, whereas some are required to find the
minimum distance between the observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely
used in machine learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular


clustering algorithms. It classifies the dataset by dividing the samples
into different clusters of equal variances. The number of clusters must
be specified in this algorithm. It is fast with fewer computations
required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas
in the smooth density of data points. It is an example of a centroid-
based model, that works on updating the candidates for centroid to be
the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of
Applications with Noise. It is an example of a density-based model
similar to the mean-shift, but with some remarkable advantages. In this
algorithm, the areas of high density are separated by the areas of low
density. Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be
used as an alternative for the k-means algorithm or for those cases
where K-means can be failed. In GMM, it is assumed that the data points
are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical
algorithm performs the bottom-up hierarchical clustering. In this, each
data point is treated as a single cluster at the outset and then
successively merged. The cluster hierarchy can be represented as a tree-
structure.
6. Affinity Propagation: It is different from other clustering algorithms as it
does not require to specify the number of clusters. In this, each data
point sends a message between the pair of data points until
convergence. It has O(N2T) time complexity, which is the main drawback
of this algorithm.
Applications of Clustering

Below are some commonly known applications of clustering technique in


Machine Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely


used for the identification of cancerous cells. It divides the cancerous
and non-cancerous data sets into different groups.
o In Search Engines: Search engines also work on the clustering technique.
The search result appears based on the closest object to the search
query. It does it by grouping similar data objects in one group that is far
from the other dissimilar objects. The accurate result of a query depends
on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of
plants and animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that
for what purpose the particular land should be used, that means for
which purpose it is more suitable.

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
hyperplane:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means
if a dataset can be classified into two classes by using a single straight
line, then such data is termed as linearly separable data, and classifier is
used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.

Hyperplane: There can be multiple lines/decision boundaries to segregate the


classes in n-dimensional space, but we need to find out the best decision
boundary that helps to classify the data points. This best boundary is known as
the hyperplane of SVM

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector.

How does SVM works?

Linear SVM:
The working of the SVM algorithm can be understood by using an example.
Suppose we have a dataset that has two tags (green and blue), and the dataset
has two features x1 and x2. We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support
vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below
image:

So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data, we
will add a third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
If we convert it in 2d space with z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.

Artificial Neural Network

"Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are
known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to


understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.

Artificial Neural Network primarily consists of three layers:


Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:
Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us
optimum results.

Types of Artificial Neural Network:

There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:

In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.

Feed-Forward ANN:

A feed-forward network is a basic neural network comprising of an input layer, an output


layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this network is that it figures
out how to evaluate and recognize input patterns.

Single Layer Feed Forward Networks:

In this type of network, we have only two layers input layer and output layer but the input
layer does not count because no computation is performed in this layer. The output layer is
formed when different weights are applied on input nodes and the cumulative effect per node
is taken. After this, the neurons collectively give the output layer to compute the output
signals.

"The process of receiving an input to produce some kind of output to make some kind of
prediction is known as Feed Forward." Feed Forward neural network is the core of many
other important neural networks such as convolution neural network.
In the feed-forward neural network, there are not any feedback loops or connections in the
network. Here is simply an input layer, a hidden layer, and an output layer.

Multi-Layer Feed Forward Networks:

This layer also has a hidden layer that is internal to the network and has no direct contact with
the external layer. The existence of one or more hidden layers enables the network to be
computationally stronger, feed-forward network because of information? ows through the
input function, and the intermediate computations used to de?ne the output Z. There are no
feedback connections in which outputs of the model are fed back into itself.

A multilayer feedforward neural network is an interconnection of perceptrons in which data


and calculations flow in a single direction, from the input data to the outputs. The number of
layers in a neural network is the number of layers of perceptrons. The simplest neural
network is one with a single input layer and an output layer of perceptrons.

In this single-layer feedforward neural network, the network’s inputs are directly connected
to the output layer perceptrons, Z1 and Z2.

The output perceptrons use activation functions, g1 and g2, to produce the outputs Y1 and Y2.

Radial Basis Function Network (RBF):

A radial basis function network is a type of supervised artificial neural network that uses
supervised machine learning (ML) to function as a nonlinear classifier. Nonlinear classifiers
use sophisticated functions to go further in analysis than simple linear classifiers that work on
lower-dimensional vectors.

A radial basis function network is also known as a radial basis network.


The Input Vector:

The input vector is the n-dimensional vector that you are trying to classify. The entire input
vector is shown to each of the RBF neurons.

The RBF Neurons:

Each RBF neuron stores a “prototype” vector which is just one of the vectors from the
training set. Each RBF neuron compares the input vector to its prototype, and outputs a value
between 0 and 1 which is a measure of similarity. If the input is equal to the prototype, then
the output of that RBF neuron will be 1. As the distance between the input and prototype
grows, the response falls off exponentially towards 0.

The Output Nodes

The output of the network consists of a set of nodes, one per category that we are trying to
classify. Each output node computes a sort of score for the associated category. Typically, a
classification decision is made by assigning the input to the category with the highest score.

Recurrent Networks:
Recurrent Networks are one such kind of artificial neural network that are mainly intended to
identify patterns in data sequences, such as text, genomes, handwriting, the spoken word,
numerical times series data emanating from sensors, stock markets, and government
agencies".

Recurrent Networks are designed to recognize patterns in sequences of data, such as text,
genomes, handwriting, the spoken word, and numerical time series data emanating from
sensors, stock markets, and government agencies.

A recurrent neural network looks similar to a traditional neural network except that a
memory-state is added to the neurons. The computation is to include a simple memory.

The recurrent neural network is a type of deep learning-oriented algorithm, which follows a
sequential approach. In neural networks, we always assume that each input and output is
dependent on all other layers. These types of neural networks are called recurrent because
they sequentially perform mathematical computations.

Application of RNN

RNN has multiple uses when it comes to predicting the future. In the financial industry, RNN
can help predict stock prices or the sign of the stock market direction
(i.e., positive or negative).

RNN is used for an autonomous car as it can avoid a car accident by anticipating the route of
the vehicle.

RNN is widely used in image captioning, text analysis, machine


translation, and sentiment analysis. For example, one should use a movie review to
understanding the feeling the spectator perceived after watching the movie.
 Machine Translation
 Speech Recognition
 Sentiment Analysis
 Automatic Image Tagger

Limitations of RNN

RNN is supposed to carry the information in time. However, it is quite challenging to


propagate all this information when the time step is too long. When a network has too many
deep layers, it becomes untrainable. This problem is called: vanishing gradient problem.

If we remember, the neural network updates the weight use of the gradient descent algorithm.
The gradient grows smaller when the network progress down to lower layers.
UNIT – V
Advanced Knowledge Representation Techniques: Case Grammars- Semantic Web-Natural
Language Processing: Introduction- Sentence Analysis Phases- Grammars and Parsers- Types
of Parsers- Semantic Analysis- Universal Networking Knowledge.

Introduction to Natural Language Processing:


NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence. It is the technology that is used by machines
to understand, analyse, manipulate, and interpret human's languages. It helps developers to
organize knowledge for performing tasks such as translation, automatic summarization,
Named Entity Recognition (NER), speech recognition, relationship extraction, and topic
segmentation.
Advantages of NLP
o NLP helps users to ask questions about any subject and get a direct response within
seconds.
o NLP offers exact answers to the question means it does not offer unnecessary and
unwanted information.
o NLP helps computers to communicate with humans in their languages.
o It is very time efficient.
o Most of the companies use NLP to improve the efficiency of documentation
processes, accuracy of documentation, and identify the information from large
databases.

Disadvantages of NLP
A list of disadvantages of NLP is given below:
o NLP may not show context.
o NLP is unpredictable
o NLP may require more keystrokes.
o NLP is unable to adapt to the new domain, and it has a limited function that's why
NLP is built for a single and specific task only.
Components of NLP
There are the following two components of NLP -
1. Natural Language Understanding (NLU): Natural Language Understanding (NLU)
helps the machine to understand and analyse human language by extracting the
metadata from content such as concepts, entities, keywords, emotion, relations, and
semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both
spoken and written language.
NLU involves the following tasks -
o It is used to map the given input into useful representation.
o It is used to analyze different aspects of the language.
2. Natural Language Generation (NLG)
Natural Language Generation (NLG) acts as a translator that converts the computerized data
into natural language representation. It mainly involves Text planning, Sentence planning,
and Text Realization.

Difference between NLU and NLG

NLU NLG

NLU is the process of reading and NLG is the process of writing or generating language.
interpreting language.

It produces non-linguistic outputs from It produces constructing natural language outputs


natural language inputs. from non-linguistic inputs.
Applications of NLP
There are the following applications of NLP -
1. Question Answering: Question Answering focuses on building systems that
automatically answer the questions asked by humans in a natural language.
2. Spam Detection: Spam detection is used to detect unwanted e-mails getting to a
user's inbox.
a. Sentiment Analysis: Sentiment Analysis is also known as opinion mining. It
is used on the web to analyse the attitude, behaviour, and emotional state of
the sender. This application is implemented through a combination of NLP
(Natural Language Processing) and statistics by assigning the values to the
text (positive, negative, or natural), identify the mood of the context (happy,
sad, angry, etc.)
b. Machine Translation: Machine translation is used to translate text or speech
from one natural language to another natural language.
Example: Google Translator
c. Spelling correction: Microsoft Corporation provides word processor software
like MS-word, PowerPoint for the spelling correction.
d. Speech Recognition: Speech recognition is used for converting spoken words
into text. It is used in applications, such as mobile, home automation, video
recovery, dictating to Microsoft Word, voice biometrics, voice user interface,
and so on.
e. Chatbot: Implementing the Chatbot is one of the important applications of
NLP. It is used by many companies to provide the customer's chat services.
f. Information extraction: Information extraction is one of the most important
applications of NLP. It is used for extracting structured information from
unstructured or semi-structured machine-readable documents.
g. Natural Language Understanding (NLU): It converts a large set of text into
more formal representations such as first-order logic structures that are easier
for the computer programs to manipulate notations of the natural language
processing.
Phases of NLP
There are the following five phases of NLP:
1. Lexical Analysis and Morphological
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. It divides the whole text into
paragraphs, sentences, and words.
2. Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship
among the words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is
rejected by the Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the
literal meaning of words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the
meaning of the sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by
applying a set of rules that characterize cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.

What is Grammar?

Grammar is defined as the rules for forming well-structured sentences.

While describing the syntactic structure of well-formed programs, Grammar plays a very
essential and important role. In simple words, Grammar denotes syntactical rules that are
used for conversation in natural languages.

The theory of formal languages is not only applicable here but is also applicable in the fields
of Computer Science mainly in programming languages and data structures.

For Example, in the ‘C’ programming language, the precise grammar rules state how
functions are made with the help of lists and statements.

Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where,


N or VN = set of non-terminal symbols, or variables.

T or ∑ = set of terminal symbols.

S = Start symbol where S ∈ N

P = Production rules for Terminals as well as Non-terminals.

It has the form α → β, where α and β are strings on VN ∪∑ and at least one symbol of α
belongs to VN

Context-Free Grammar (CFG)

A context-free grammar, which is in short represented as CFG, is a notation used for


describing the languages and it is a superset of Regular grammar which you can see from
the following diagram:

CFG consists of a finite set of grammar rules having


the following four components

 Set of Non-Terminals
 Set of Terminals
 Set of Productions
 Start Symbol

Set of Non-terminals

It is represented by V. The non-terminals are syntactic


variables that denote the sets of strings, which helps in defining the language that is
generated with the help of grammar.

Set of Terminals

It is also known as tokens and represented by Σ. Strings are formed with the help of the
basic symbols of terminals.

Set of Productions

It is represented by P. The set gives an idea about how the terminals and nonterminals can
be combined. Every production consists of the following components:

 Non-terminals,
 Arrow,
 Terminals (the sequence of terminals).

The left side of production is called non-terminals while the right side of production is called
terminals.
Start Symbol

The production begins from the start symbol. It is represented by symbol S. Non-terminal
symbols are always designated as start symbols.

Constituency Grammar (CG)

It is also known as Phrase structure grammar. It is called constituency Grammar as it is


based on the constituency relation. It is the opposite of dependency grammar.

Before deep dive into the discussion of CG, let’s see some fundamental points about
constituency grammar and constituency relation.

 All the related frameworks view the sentence structure in terms of constituency
relation.
 To derive the constituency relation, we take the help of subject-predicate division of
Latin as well as Greek grammar.
 Here we study the clause structure in terms of noun phrase NP and verb phrase VP.

For Example,

Sentence: This tree is illustrating the constituency relation

Now, Let’s deep dive into the discussion on Constituency Grammar:

In Constituency Grammar, the constituents can be any word, group of words, or phrases and
the goal of constituency grammar is to organize any sentence into its constituents using
their properties. To derive these properties we generally take the help of:

 Part of speech tagging,


 A noun or Verb phrase identification, etc

For Example, constituency grammar can organize any sentence into its three constituents- a
subject, a context, and an object.
Sentence: <subject><context><object>
These three constituents can take different values and as a result, they can generate
different sentences. For Example, If we have the following constituents, then

<subject> The horses / The dogs / They


<context> are running / are barking / are eating
<object> in the park / happily / since the morning
Example sentences that we can be generated with the help of the above constituents are:

“The dogs are barking in the park”


“They are eating happily”
“The horses are running since the morning”
Now, let’s look at another view of constituency grammar is to define their grammar in terms
of their part of speech tags.

Say a grammar structure containing a

[determiner, noun] [ adjective, verb] [preposition, determiner, noun]


which corresponds to the same sentence – “The dogs are barking in the park”

Another view (Using Part of Speech)


< DT NN >< JJ VB >< PRP DT NN > -------------> The dogs are barking in the park

Dependency Grammar (DG)

It is opposite to the constituency grammar and is based on the dependency relation.


Dependency grammar (DG) is opposite to constituency grammar because it lacks phrasal
nodes.

Before deep dive into the discussion of DG, let’s see some fundamental points about
Dependency grammar and Dependency relation.

 In Dependency Grammar, the words are connected to each other by directed links.
 The verb is considered the center of the clause structure.
 Every other syntactic unit is connected to the verb in terms of directed link. These
syntactic units are called dependencies.

For Example,Sentence: This tree is illustrating the dependency relation


Grammars and Languages
The types of grammars that exist are Noam Chomsky invented a hierarchy of grammars.
The hierarchy consists of four main types of grammars.
The simplest grammars are used to define regular languages.
A regular language is one that can be described or understood by a finite state automaton.
Such languages are very simplistic and allow sentences such as “aaaaabbbbbb.” Recall that a
finite state automaton consists of a finite number of states, and rules that define how the
automaton can transition from one state to another.
A finite state automaton could be designed that defined the language that consisted of a
string of one or more occurrences of the letter a. Hence, the following strings would be valid
strings in this language:
Aaa a aaaaaaaaaaaaaaaaa
Regular languages are of interest to computer scientists, but are not of great interest to the
field of natural language processing because they are not powerful enough to represent
even simple formal languages, let alone the more complex natural languages.
Sentences defined by a regular grammar are often known as regular expressions. The
grammar that we defined above using rewrite rules is a context-free grammar.
It is context free because it defines the grammar simply in terms of which word types can go
together—it does not specify the way that words should agree with each.

Parsing: Syntactic Analysis


As we have seen, morphologic analysis can be used to determine to which part of speech
each word in a sentence belongs. We will now examine how this information is used to
determine the syntactic structure of a sentence.
This process, in which we convert a sentence into a tree that represents the sentence’s
syntactic structure, is known as parsing.
Parsing a sentence tells us whether it is a valid sentence, as defined by our grammar
If a sentence is not a valid sentence, then it cannot be parsed. Parsing a sentence involves
producing a tree, such as that shown in Fig 10.1, which shows the parse tree for the
following sentence: The black cat crossed the road.

This tree shows how the sentence is made up of a noun phrase and a verb phrase.
The noun phrase consists of an article, an adjective, and a noun. The verb phrase consists of
a verb and a further noun phrase, which in turn consists of an article and a noun.
Parse trees can be built in a bottom-up fashion or in a top-down fashion.
Building a parse tree from the top down involves starting from a sentence and determining
which of the possible rewrites for Sentence can be applied to the sentence that is being
parsed. Hence, in this case, Sentence would be rewritten using the following rule:
Sentence→NounPhraseVerbPhrase
Then the verb phrase and noun phrase would be broken down recursively in the same way,
until only terminal symbols were left.
When a parse tree is built from the top down, it is known as a derivation tree.
To build a parse tree from the bottom up, the terminal symbols of the sentence are first
replaced by their corresponding nonterminal (e.g., cat is replaced by noun), and then these
nonterminal are combined to match the right-hand sides of rewrite rules.
For example, the and road would be combined using the following rewrite rule:
NounPhrase→Article Noun

Top-down Parsing
Top-down parsing starts with the starting symbol and proceeds towards the goal. We can
say it is the process of construction the parse tree starting at the root and proceeds towards
the leaves. It is a strategy of analyzing unknown data relationships by hypothesizing general
parse tree structures and then considering whether the known fundamental structures are
compatible with the hypothesis. In top down parsing words of the sentence are replaced by
their categories like verb phrase (VP), Noun phrase (NP), Preposition phrase (PP), Pronoun
(PRO) etc. Let us consider some examples to illustrate top down parsing. We will consider
both the symbolical representation and the graphical representation. We will take the
words of the sentences and reach at the complete sentence. For parsing we will consider
the previous symbols like PP, NP, VP, ART, N, V and so on. Examples of top down parsing are
LL (Left-to-right, left most derivation), recursive descent parser etc.

Bottom-up Parsing
In this parsing technique the process begins with the sentence and the words of the
sentence is replaced by their relevant symbols. This process was first suggested by Yngve
(1955). It is also called shift reducing parsing. In bottom up parsing the construction of parse
tree starts at the leaves and proceeds towards the root. Bottom up parsing is a strategy for
analyzing unknown data relationships that attempts to identify the most fundamental units
first and then to infer higher order structures for them. This process occurs in the analysis of
both natural languages and computer languages. It is common for bottom up parsers to take
the form of general parsing engines that can wither parse or generate a parser for a specific
programming language given a specific of its grammar.
A generalization of this type of algorithm is familiar from computer science LR (k) family can
be seen as shift reduce algorithms with a certain amount (“K” words) of look ahead to
determine for a set of possible states of the parser which action to take. The sequence of
actions from a given grammar can be pre-computed to give a ‘parsing table’ saying whether
a shift or reduce is to be performed and which state to go next. Generally bottom up
algorithms are more efficient than top down algorithms, one particular phenomenon that
they deal with only clumsily are “e mpty rules”: rules in which the right hand side is the
empty string. Bottom up parsers find instances of such rules applying at every possible point
in the input which can lead to much wasted effort. Let us see some examples to illustrate
the bottom up parsing.
Natural language processing (NLP) is the interactions between computers and
human language, how to program computers to process and analyses large
amounts of natural language data. The technology can accurately extract
information and insights contained in the documents as well as categorize and
organize the documents themselves. Many different classes of machine -learning
algorithms have been applied to natural -language processing tasks. These
algorithms take as input a large set of “features”
language that enables computers to process information and knowledge across the
language barriers. It is an artificial language that replicates the functions of natural
languages in human communication.It expresses information or knowledge in the form of
semantic networks. Unlike natural languages, UNL expressions areunambiguous.Although
the UNL is a language for computers, it has all the components of a natural language.

Semantic Analysis
Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts to
understand the meaning of Natural Language. Understanding Natural Language might seem a
straightforward process to us as humans. However, due to the vast complexity and
subjectivity involved in human language, interpreting it is quite a complicated task for
machines. Semantic Analysis of Natural Language captures the meaning of the given text
while taking into account context, logical structuring of sentences and grammar roles.
Parts of Semantic Analysis
Semantic Analysis of Natural Language can be classified into two broad parts:
1. Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding the
meaning of each word of the text individually. It basically refers to fetching the dictionary
meaning that a word in the text is deputed to carry.
2. Compositional Semantics Analysis: Although knowing the meaning of each word of the
text is essential, it is not sufficient to completely understand the meaning of the text.
For example, consider the following two sentences:
 Sentence 1: Students love animals.

 Sentence 2: animalsloves Students.

Although both these sentences 1 and 2 use the same set of root words {student, love,
animals}, they convey entirely different meanings.
Elements of Semantic Analysis
Some of the critical elements of Semantic Analysis that must be scrutinized and taken into
account while processing Natural Language are:
 Hyponymy: Hyponymys refers to a term that is an instance of a generic term. They
can be understood by taking class-object as an analogy. For example: ‘Color‘ is a
hypernymy while ‘grey‘, ‘blue‘, ‘red‘, etc, are its hyponyms.

 Homonymy: Homonymy refers to two or more lexical terms with the same spellings
but completely distinct in meaning. For example: ‘Rose‘ might mean ‘the past form of
rise‘ or ‘a flower‘, – same spelling but different meanings; hence, ‘rose‘ is a
homonymy.

 Synonymy: When two or more lexical terms that might be spelt distinctly have the
same or similar meaning, they are called Synonymy. For example: (Job, Occupation),
(Large, Big), (Stop, Halt).

 Antonymy: Antonymy refers to a pair of lexical terms that have contrasting meanings
– they are symmetric to a semantic axis. For example: (Day, Night), (Hot, Cold),
(Large, Small).

 Polysemy: Polysemy refers to lexical terms that have the same spelling but multiple
closely related meanings. It differs from homonymy because the meanings of the
terms need not be closely related in the case of homonymy. For example: ‘man‘ may
mean ‘the human species‘ or ‘a male human‘ or ‘an adult male human‘ – since all
these different meanings bear a close association, the lexical term ‘man‘ is a
polysemy.

 Meronomy: Meronomy refers to a relationship wherein one lexical term is a


constituent of some larger entity. For example: ‘Wheel‘ is a meronym of
‘Automobile‘
Semantic Analysis Techniques
Based upon the end goal one is trying to accomplish, Semantic Analysis can be used in
various ways. Two of the most common Semantic Analysis techniques are:
Text Classification
In-Text Classification, our aim is to label the text according to the insights we intend to gain
from the textual data.
For example:
 In Sentiment Analysis, we try to label the text with the prominent emotion they
convey. It is highly beneficial when analyzing customer reviews for improvement.

 In Topic Classification, we try to categories our text into some predefined categories.
For example: Identifying whether a research paper is of Physics, Chemistry or Maths

 In Intent Classification, we try to determine the intent behind a text message. For
example: Identifying whether an e-mail received at customer care service is a query,
complaint or request.

Text Extraction
In-Text Extraction, we aim at obtaining specific information from our text.
For Example,
 In Keyword Extraction, we try to obtain the essential words that define the entire
document.

 In Entity Extraction, we try to obtain all the entities involved in a document.

Universal Networking Knowledge (UNK):


Universal Networking Language (UNL) is a declarative formal language specifically
designed to represent semantic data extracted from natural language texts. It can be used as
a pivot language in interlingual machine translation systems or as a knowledge
representation language in information retrieval applications.
In the UNL approach, information conveyed by natural language is represented sentence by
sentence as a hypergraph composed of a set of directed binarylabeled links (referred to
as relations) between nodes or hypernodes (the Universal Words, or simply UWs), which
stand for concepts. UWs can also be annotated with attributes representing context
information.
 Universal Networking Language (UNL) is a computer language that enables
computers to process information and knowledge across the language barriers.
 It is an artificial language that replicates the functions of natural languages in human
communication.

 It expresses information or knowledge in the form of semantic network with hyper-


node.

 Unlike natural languages, UNL expressions are unambiguous.

 Although the UNL is a language for computers, it has all the components of a natural
language.

 It is composed of Universal Words (UWs), Relations, Attributes.

You might also like