Professional Documents
Culture Documents
Ans: Definitions of AI
There are as many definitions as there are practitioners.
“The art of creating machines that perform functions that require intelligence when performed by
people.” (Kurzweil)
“The study of how to make computers do things at which, at the moment, people are better.” (Rich
and Knight)
You enter a room which has a computer terminal. You have a fixed period of time to type what you
want into the terminal, and study the replies. At the other end of the line is either a human being or a
computer system.
If it is a computer system, and at the end of the period you cannot reliably determine whether it is a
system or a human, then the system is deemed to be intelligent.
A human questioner cannot tell if there is a computer or a human answering his question, via teletype
(remote communication)
The computer must behave intelligently
Turing test
In this test, Turing proposed that the computer can be said to be an intelligent if it can mimic human
response under specific conditions.
The Turing test is based on a party game "Imitation game," with some modifications. This game involves
three players in which one player is Computer, another player is human responder, and the third player is
a human Interrogator, who is isolated from other two players and his job is to find that which player is
machine among two of them.
Player A (Computer): No
In this game, if an interrogator would not be able to identify which is a machine and which is human,
then the computer passes the test successfully, and the machine is said to be intelligent and can
think like a human.
Cognitive Science
“The exciting new effort to make computers think ... machines with minds in the full and literal
sense” (Haugeland)
“[The automation of] activities that we associate with human thinking, activities such as decision-
making, problem solving, learning ...” (Bellman)
Logical approach is often not feasible in terms of computation time (needs ‘guidance’)
“The study of mental facilities through the use of computational models” (Charniak and McDermott)
“The study of the computations that make it possible to perceive, reason, and act” (Winston)
APPLICATION OF AI
Autonomous rovers.
Telescope scheduling
Analysis of data:
2. Medicine:
3. Transportation:
Pedestrian detection:
4. Games
Games
answer
• Philosophy
At that time, the study of human intelligence began with no formal expression • Initiate the idea of mind
as a machine and its internal operations
Mathematics
formalizes the three main area of AI: computation, logic, and probability \ Computation leads to analysis
of the problems that can be computed complexity theory Probability contributes the “degree of belief”
to handle uncertainty in AI Decision theory combines probability theory and utility theory (bias)
• Psychology
How do humans think and act? • The study of human reasoning and acting • Provides reasoning models
for AI • Strengthen the ideas • humans and other animals can be considered as information processing
machines
• Computer Engineering
How to build an efficient computer? • Provides the artifact that makes AI application possible • The
power of computer makes computation of large and difficult problems more easily • AI has also
contributed its own work to computer science, including: time-sharing, the linked list data type, OOP,
etc.
How can artifacts operate under their own control? • The artifacts adjust their actions • To do better
for the environment over time • Based on an objective function and feedback from the environment •
Not limited only to linear systems but also other problems • as language, vision, and planning, etc.
• Linguistics
For understanding natural languages • different approaches has been adopted from the linguistic work •
Formal languages • Syntactic and semantic analysis • Knowledge representation
• Breadth-First search
• BFS uses Queue data structure for finding the shortest path.
• BFS works on concept of FIFO (First In First Out )
• BFS is more suitable for searching vertices closer to the given source.
• In BFS there is no concept of backtracking.
• Depth-First search
• Uniform-Cost search
Uniform-cost search is an uninformed search algorithm that uses the lowest cumulative cost to find a
path from the source to the destination.
Nodes are expanded, starting from the root, according to the minimum cumulative cost.
IDDFS combines depth-first search’s space-efficiency and breadth-first search’s fast search (for nodes
closer to root)
IDDFS calls DFS for different depths starting from an initial value.
• Best-first search
• Greedy Search
• Beam search
• Algorithm AO
• Algorithm A*
Finds the shortest path through the search space using the
Heuristic function.
Ans: BFS
DFS
Ans.
2. Depth-First search
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The
algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra
memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch
which helps in backtracking of the graph
3. Uniform-Cost search
Uniform-cost search is an uninformed search algorithm that uses the lowest cumulative cost to find a
path from the source to the destination. Nodes are expanded, starting from the root, according to the
minimum cumulative cost. The uniform-cost search is then implemented using a Priority Queue.
Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal of the
uniform-cost search is to find a path to the goal node which has the lowest cumulative cost. Uniform-
cost search expands nodes according to their path costs form the root node. It can be used to solve any
graph/tree where the optimal cost is in demand. A uniform-cost search algorithm is implemented by the
priority queue. It gives maximum priority to the lowest cumulative cost. Uniform cost search is
equivalent to BFS algorithm if the path cost of all edges is the same.
IDDFS combines depth-first search’s space-efficiency and breadth-first search’s fast search (for nodes
closer to root)
IDDFS calls DFS for different depths starting from an initial value. In every call, DFS is restricted from
going beyond given depth. So basically we do DFS in a BFS fashion.
The main idea here lies in utilizing the re-computation of entities of the boundary instead of stocking
them up. Every re-computation is made up of DFS and thus it uses less space. Now let us also consider
using BFS in iterative deepening search.
Ans:
• Hill climbing
• Best-first search
• Greedy Search
• Beam search
• Algorithm AO
• Algorithm A*
A* search:
Most commonly known form of best-first search. Uses h(n) + g(n). Features of UCS + greedy
best-first search. Finds the shortest path through the search space using the heuristic function.
This search algorithm expands less search tree and provides optimal result faster. Uses search
heuristic as well as the cost to reach the node. Hence we can combine both costs as following,
and this sum is called as a fitness number.
Ans.
Expert system are the computer application developed in AI to solve complex problem in a
particular domain; at the level of extra-ordinary human intelligence and expertise. It emulates
the decision-making ability of a human expert.
Expert systems are designed to solve complex problems by reasoning through bodies of
knowledge, represented mainly as if-then rules rather than through conventional procedural
code.
Expert systems have specific knowledge to one problem domain, e.g., medicine, science,
engineering, etc. The expert’s knowledge is called a knowledge base, and it contains
accumulated experience that has been loaded and tested in the system.
Like other artificial intelligence systems, expert system’s knowledge may be enhanced with add-
ons to the knowledge base, or additions to the rules.
The more experience entered into the expert system, the more the system can improve its
performance.
Characteristics of Expert System:
1. Highly responsive
2. Reliable
3. Understandable
4. High performance
MYCIN was built using LISP programming language, it was the first AI programming language.
MYCIN is goal directed system which uses backward chaining reasoning approach
MYCIN was an early backward chaining expert system that used artificial intelligence to identify
bacteria causing severe infections, such as bacteremia and meningitis
The name is derived from the antibiotics themselves, as many antibiotics have the suffix "-
mycin"
This system was able to perform as well as some experts and considerably better than junior
doctors. A consultation with MYCIN begins with requests for routine information such as age,
medical history and so on, programming to more specific questions as required.
MYCIN helps the physician to prescribe disease-specific drugs. MYCIN informs itself about
particular cases by requesting information from the physician about a patient’s symptoms,
general condition, history, and laboratory-test results.
At each point, the question MYCIN asks is determined by MYCIN’S current hypothesis and the
answer to all previous questions. When MYCIN is satisfied that it has a reasonably good grasp of
the situation, MYCIN announces its diagnosis.
Ans. Building an ES initially requires extracting the relevant knowledge from a human domain expert;
this knowledge is often based on useful thumb rules and experiences rather than absolute certainties.
Developed system should be able to explain its reasoning to its users and answer questions about the
solution process. Moreover, updating the system should just involve adding or deleting localized regions
of knowledge.
The different phases involved in building an ES may be categorized as follows:
Identification Phase: In this phase, the knowledge engineer determines important features of
the problem with the help of the human domain expert. The parameters that are determined in
this phase include the type and scope of the problem, the kind of resources required, and the
goal and objective of the ES.
Conceptualization Phase: In this phase, knowledge engineer and domain expert decide the
concepts, relations, and control mechanism needed to describe the problem-solving method. At
this stage, issue of granularity is also addressed, which refers to the level of details required in
the knowledge.
Formalization Phase: This phase involves expressing the key concepts and relations in some
frame work supported by ES building tools. Formalized knowledge consists of data structures,
inference rules, control strategies, and languages requires for implementations.
Implementation Phase: During this phase, formalized knowledge is converted to a working
computer program, initially called “prototype” of the whole system.
Testing Phase: This phase involves evaluating the performance and utility of prototype system
and revisiting the system, if requires. The domain expert evaluates the prototype system and
provides feedback, which helps the knowledge engineer to revise it.
Ans. Expert System is an interactive and reliable computer-based decision-making system which uses
both facts and heuristics to solve complex decision-making problems. It is considered at the highest level
of human intelligence and expertise. The purpose of an expert system is to solve the most complex
issues in a specific domain.
MYCIN: It was based on backward chaining and could identify various bacteria that could cause acute
infections. It could also recommend drugs based on the patient’s weight. It is one of the best Expert
System
DENDRAL: Expert system used for chemical analysis to predict molecular structure.
PXDES: An Example of Expert System used to predict the degree and type of lung cancer
CaDet: One of the best Expert System Example that can identify cancer at early stages
Ans: The system called MYCIN was developed using the expertise of best diagnostician of
bacterial infections whose performance was found to be better than the average.
MYCIN was invented in 1972 when Edward Shortliffe developed the system with a team from
Stanford University.
MYCIN was designed to help identify bacteria that cause blood infections and other severe
infections like meningitis.
The MYCIN System was a computer-based system physicians used to identify blood infections
and the most appropriate treatments.
The MYCIN Expert System used backward chaining technology to diagnose infections based on
symptoms and medical history and recommend treatment based on the data received.
MYCIN refers to a backward chaining expert system that helped diagnose and suggest
infections, named after a typical class of antibiotics in use.
MYCIN was an expert system using backward chaining, a form of artificial intelligence. In this
context, backward chaining helped determine that the patient had an infection and worked
back through several steps to determine the type of bacteria and which antibiotics to use.
Advantages included making it easier to find out the causes because of the known endpoint.
Ans) The knowledge base contains the specific domain knowledge that is used by an expert to derive
conclusions from facts.
In the case of a rule-based expert system, this domain knowledge is expressed in the form of a series of
rules.
The explanation system provides information to the user about how the inference engine arrived at its
conclusions. This can often be essential, particularly if the advice being given is of a critical nature, such
as with a medical diagnosis system
If the system has used faulty reasoning to arrive at its conclusions, then the user may be able to see this by
examining the data given by the explanation system.
The fact database contains the case-specific data that are to be used in a particular case to derive a
conclusion
In the case of a medical expert system, this would contain information that had been obtained about the
patient’s condition.
The user of the expert system interfaces with it through a user interface, which provides access to the
inference engine, the explanation system, and the knowledge-base editor.
The inference engine is the part of the system that uses the rules and facts to derive conclusions. The
inference engine will use forward chaining, backward chaining, or a combination of the two to make
inferences from the data that are available to it.
The knowledge-base editor allows the user to edit the information that is contained in the knowledge base.
The knowledge-base editor is not usually made available to the end user of the system but is used by the
knowledge engineer or the expert to provide and update the knowledge that is contained within the system.
13. Write a short note on Forward and Backward chaining.
Ans) Forward chaining
Forward chaining is a method of reasoning in artificial intelligence in which inference rules are applied to
In this type of chaining, the inference engine starts by evaluating existing facts, derivations, and conditions
before deducing new information. An endpoint (goal) is achieved through the manipulation of knowledge
Backward chaining
Backward chaining is a concept in artificial intelligence that involves backtracking from the endpoint or goal
to steps that led to the endpoint. This type of chaining starts from the goal and moves backward to
The backtracking process can also enable a person establish logical steps that can be used to find other
important solutions.
• Knowledge Base
• Inference Engine
• User Interface
• Explanation module
• Knowledge base: The knowledge base in an expert system represents facts and rules. It
contains knowledge in specific domains along with rules in order to solve problems and form procedures
that are relevant to the domain.
• Inference engine: The most basic function of the inference engine is to acquire relevant data
from the knowledge base, interpret it, and to find a solution as per the user’s problem. Inference
engines also have explanationatory and debugging abilities.
• Knowledge acquisition and learning module: This component functions to allow the expert
systems to acquire more data from various sources and store it in the knowledge base.
• User interface: This component is essential for a non-expert user to interact with the expert
system and find solutions.
• Explanation module: As the name suggests, this module helps in providing the user with an
explanation of the achieved conclusion.
• Expert Systems can work steadily work without getting emotional, tensed or fatigued.
Ans. Expert system are the computer application developed in AI to solve complex problem in a
particular domain; at the level of extra-ordinary human intelligence and expertise. It emulates the
decision-making ability of a human expert.
LIMITATIONS
Ans: If-then rules are one of the most common forms of knowledge representation used in
expert systems. Systems employing such rules as the major representation paradigm are called
rule-based systems. Some people refer to them as production systems. There are some
differences between rule-based systems and production systems, but we will ignore these
differences and use the terms interchangeably. In computer science, a rule-based system is
used to store and manipulate knowledge to interpret information in a useful way. It is often
used in artificial intelligence applications and research. Rule-based systems constructed using
automatic rule inference, such as rule-based machine learning.
The rule-based expert systems consist of three important elements:
Set of Facts: These are assertions or anything relevant to the beginning state of the
system.
Set of Rules: It contains all actions that should be taken within the scope of a problem
and specify how to act on the assertion set. Here, facts are represented in an IF-THEN
form.
Termination Criteria or Interpreter: Determines whether a solution exists or not, as well
as when to terminate the process.
Ans: Blackboard is the common data structure of the knowledge sources. The blackboard all the states of
the given problem space. The blackboard usually contains several levels of description w.r.t the problem
space. These levels may have several relationships with each other. These levels are a part of the same
data structure. In case more than one data structure is needed, the representation is broken into panels
and each panel can now hold multiple levels.
The knowledge source is a component that adds to the solution of the problem. It is anything that reads
from the blackboard and suggests some changes to parts of the blackboard. Usually, the knowledge
sources are disconnected with other knowledge sources.
Scheduler controls and decides which knowledge source will get an opportunity to change the
blackboard. For every execution cycle, the scheduler observes the changes made to the blackboard and
activates the knowledge source to execute the next change.
Ans :- The area of expert systems has been of interest to AI researcher. The major purpose of building
as ES for an organization is to preserve the know – how , experience, and expertise of the experts,
which is a valuable asset to the organization. The purpose of ES is to provide this knowledge to other
members of the organization for problems in different types of domains. The appropriate problem –
solving technique depends generally on the type of problem and the domain. Application may be
categorized into the following major classes:
DIAGNOSIS :- The expert systems belonging to this class perform the task of inferring malfunctioning of
system from observations. Such expert systems use situation descriptions, behaviour characteristics, or
knowledge about component design to determine the probable cause of system malfunction. These
systems may also be used for diagnosis of faulty modules in large signal switching networks and for
finding faults in computer hardware system. Diagnosis can refer to inferring a possible disease from a
given set of symptoms in the field of medicine.
PLANNING & SCHEDULING :- The expert system of this class help in designing actions and plans before
actually solving a given problem. They analyse a set of one or more potentially complex and interacting
goals in order to determine a set of actions that are needed to achieve these goals.
DESIGN & MANUFACTURING:- This is one of the most important areas for ES applications. Here, a
solution to problem is configured by a given set of objects under a set of constraints. Configuration
applications were pioneered by computer companies to facilitate the manufacturing of semi-custom
minicomputers.
PREDICTION :- The expert systems of this class perform the task of inferring the likely consequences of
a given situation.
INTERPRETATION :- The expert systems of this class perform the task of interpreting and inferring
situation description data of any domain such as geological data, census data, medical data, etc.
FINANCIAL DECISION MAKING :- The financial services industry has been a prominent user of ES
techniques. Such systems assist insurance companies to assess the risk presented by the customer and
to determine a price for the insurance.
PROCESS MONITORING AND CONTROL :- Exprt systems belonging to this class analyse real time data
from physical devices with the goal by comparing observations to expected outcomes, predicting trends,
and controlling for both optimality and failure correction.
DEBUGGING:- The systems of this class prescribe remedies for malfunctioning devices.
22. What are the different Shells and Tools for Expert system?
•Similarly the probability of head H on tossing one or both coins can be calculated. It is
called Union of the probabilities P(A) and P(B) , and is denoted by P(A U B) , it is also
written as P(A or B) . It can be calculated for above example as follows
P (A or B) = P(A) + P(B) – P(A)* P(B)
= 0.5 + 0.5 – 0.25
= 0.75
= (0.9*0.25) / (0.9*0.25+0.2*0.75)
= (0.225 / 0.375)
= 0.6
1 Classical set defines the value is either 0 or 1. Fuzzy set defines the value between 0 and 1 including both 0
and 1.
2 It is also called a crisp set. It specifies the degree to which something is true.
9. What are the different Fuzzy set operations? ( Aaranta vijay waykar21306A1027)
There are three operation performs on fuzzy set are : fuzzy complements, fuzzy
intersection, and fuzzy unions.
1. Complements of fuzzy set Ā(x) :
The complement is the oppsite of the set. The complement of a fuzzy set is
denoted by A(x) and is defined with respect to the universal set X as follows:
A`()X = 1-A(x) for all x £ x
Bell(x;a,b,c)= 1/1+|x-c|^2b/a
where the parameter b is usually positive. (If b is negative, the shape of this MF becomes an
upside-down bell.) Note that this MF is a direct generalization of the Cauchy distribution used in
probability theory, so it is also referred to as the Cauchy MF.
E) Sigmoid membership function:
A sigmoid MF is defined by
sig(r; a, c)=1/1+expl-a(x-c)]
The second group targets the design of electronic circuits that employ more than two
discrete levels of signals, such as many-valued memories, arithmetic circuits, and field
programmable gate arrays (FPGAs). Many-valued circuits have a number of theoretical
advantages over standard binary circuits. For example, the interconnect on and off chip
can be reduced if signals in the circuit assume four or more levels rather than only two. In
memory design, storing two instead of one bit of information per memory cell doubles
the density of the memory in the same die size. Applications using arithmetic circuits
often benefit from using alternatives to binary number systems. For example, residue and
redundant number systems[15] can reduce or eliminate the ripple-through carries that are
involved in normal binary addition or subtraction, resulting in high-speed arithmetic
operations. These number systems have a natural implementation using many-valued
circuits. However, the practicality of these potential advantages heavily depends on the
availability of circuit realizations, which must be compatible or competitive with present-
day standard technologies. In addition to aiding in the design of electronic circuits, many-
valued logic is used extensively to test circuits for faults and defects. Basically all known
automatic test pattern generation (ATG) algorithms used for digital circuit testing require
a simulator that can resolve 5-valued logic (0, 1, x, D, D’).[16] The additional values—x,
D, and D’—represent (1) unknown/uninitialized, (2) a 0 instead of a 1, and (3) a 1 instead
of a 0..
For example, speed may be fast, very fast, medium, slow, and very slow. In fuzzy logic the truth value of
fuzzy proposition is also depend on an additional factor known as degree of truth whose value is varies
between 0 and 1.
For example
p: Speed is Slow
T(p) = 0.8, if p is partly true
T(p) = 1, if p is absolutely true
T(p) = 0, if p is totally false
So, we can say that fuzzy proposition is a statement p which acquires a fuzzy truth value
T(p) range from (0 to1)
Different types of Fuzzy Propositions:
1. Unconditional and unqualified propositions
The canonical form of this type of fuzzy proposition is p:V is F
Where, V is a variable which takes value v from a universal set U. F is a fuzzy set on U that represents a
given inaccurate predicate such as fast, low, tall etc.
For example:
16. What are the different applications of fuzzy systems? (21306A1042 – Sushma
Singh)
Ans:- Fuzzy System Applications
“If all motion vectors are almost parallel and their time differential is small, then the hand
jittering is detected and the direction of the hand movement is in the direction of the moving
vectors”. Image Stabilization via Fuzzy Logic
1)Aerospace: Altitude control of spacecraft, satellite altitude control, flow and mixture
regulation in aircraft de-
control, shift scheduling method for automatic transmission, intelligent highway systems, traffic
control, improving efficiency of automatic transmissions
2)Business Decision-making support systems: personnel evaluation in a large company,
18. In a class, there are 70% of the students who like English and 40% of the students
who likes English and mathematics, and then what is the percent of students those
who like English also like mathematics? (Gorima(21306A1054))
Ans. Let, A is an event that a student like mathematics.
B is an event that a student like English.
P(A|B) = P(A^B)/P(B) = 0.4/0.5 = 57%
Hence, 57% are the students who like English also like Mathematics.
19. Two dies are thrown simultaneously, and the sum of the numbers obtained is found
to be 7. What is the probability that the number 3 has appeared at least once?
(21306A1023 – Komal Gupta)
Ans:- The sample space S would consist of all the numbers possible by the combination
of two dies. Therefore S consists of 6 × 6, i.e. 36 events.
Event A indicates the combination in which 3 has appeared at least once.
Event B indicates the combination of the numbers which sum up to 7.
A = {(3, 1), (3, 2), (3, 3)(3, 4)(3, 5)(3, 6)(1, 3)(2, 3)(4, 3)(5, 3)(6, 3)}
B = {(1, 6)(2, 5)(3, 4)(4, 3)(5, 2)(6, 1)}
P(A) = 11/36
P(B) = 6/36
A∩B=2
P(A ∩ B) = 2/36
Applying the conditional probability formula we get,
P(A|B) = P(A∩B)/P(B) = (2/36)/(6/36) = ⅓
20. In a batch, there are 80% C programmers, and 40% are Java and C programmers.
What is the probability that a C programmer is also Java programmer?
(21306A1011 – Shraddha Kasar)
Ans:- Let A --> Event that a student is Java programmer
B --> Event that a student is C programmer
P(A|B) = P(A ∩ B) / P(B)
= (0.4) / (0.8)
= 0.5
So there are 50% chances that student that knows C also knows Java
21. Write the difference between conditional probability and Bayes Theorem (Sejal
Shingre -21306A1040)
Conditional Probability Bayes Theorem
Conditional Probability is the probability of Bayes Theorem includes two conditional
occurrence of a certain event, say AA, based probabilities for the events, say AA and BB.
on some other event whether BB is true or
not.
The equation of conditional probability The equation of Bayes Theorem
is:P(A|B)=P(A∩B)P(B)P(A|B)=P(A∩B)P(B) is:P(A|B)=P(B|A)×P(A)P(B)P(A|B)=P(B|A)×
P(A)P(B)
It is used for relatively simple problems. It gives a structured formula for solving more
complex problems.
22. Suppose we are given the probability of Mike has a cold as 0.25, the probability of
Mike was observed sneezing when he had cold in the past was 0.9 and the
probability of Mike was observed sneezing when he did not have cold as 0.20. Find
the probability of Mike having a cold given that he sneezes. Madhushree Parab
(21306A1026)
SOLUTION:
P(H)=0.25 .: P(~H)=1-P(H)=1-0.25=0.75
P(E|H)=0.9
P(E|~H)=0.20
P(H|E)=?
= 0.9*0.25 / 0.9*0.25+0.20*0.75
=0.225/0.375
=0.6
=60%
Hence, we can conclude that mike probability of having cold given that he sneezes is equal to 0.6.
Similarly, we can determine his probability of having cold if he was not sneezing in the following
manner.
=0.025 / 0.625
=0.04
Hence, Mike’s probability of having a cold if he was not sneezing is obtained to be equal to 0.04.
23. Dangerous fires are rare (1%). but smoke is fairly common (10%) due to barbecues,
and 90% of dangerous fires make smoke. Discover the probability of dangerous Fire
when there is Smoke (21306A1027 aaranta waykar )
24. What are the advantages of Fuzzy Logic? 21306a1061 Mitali Jadhav
Ans The methodology of this concept works similarly as the human reasoning. Any user can
easily understand the structure of Fuzzy Logic. It does not need a large memory, because the
algorithms can be easily described with fewer data. It is widely used in all fields of life and
easily provides effective solutions to the problems which have high complexity. This concept is
based on the set theory of mathematics, so that’s why it is simple. It allows users for controlling
the control machines and consumer products. The development time of fuzzy logic is short as
compared to conventional methods. Due to its flexibility, any user can easily add and delete rules
in the FLS system.
27. Y={(5,1),(10,0.5), (20, 08), (30,0.4)} Apply CON and DIL operators on Y.
Solution:
28. Y={(5,1),(10,0.5), (20, 08), (30,0.4)} Write height, cardinality and norms for Y.
Types of learning:-
1. Supervised learning
The model used in supervised learning describes the effect of one set of observations (called inputs) on another set of
observations (called outputs). Here both the sets are given and the pur- pose is to find function f that transforms given input x into
given output y. In this type of learning, inputs are assumed to be provided at the beginning, while outputs are obtained at the end
of the casal chain. Alternatively, in supervised learning the user tries to find the connection between two given sets of
observations, namely inputs and outputs.
2. Unsupervised Learning
In unsupervised learning, all the input observations are given and no output observations are available. Unsupervised learning
enables users to learn larger and more complex models as compared to supervised learning. Supervised learning cannot be used
to learn models with deep hierarchies, as the difficulty of the learning task increases exponentially between the two sets.
However, in unsupervised learning, the learning can proceed hierarchically from the observations into more abstract levels of
representation.
3. Reinforcement Learning
In reinforcement learning, the decision-making system (also known as agent) receives rewards or feedback (positive or negative)
for its action at the end of a sequence of steps. It is required to assign reward to steps while solving the credit assignment
problem; this problem determines which steps should receive credit or blame for the final result. As opposed to supervised
learning. reinforcement learning takes place in an environment where the agent cannot directly compare the results of its action to
a desired result. Instead, it is given a positive and negative feedback directly on the basis of its actions. Reinforcement learning
may cause a system to win or lose a game, or inform a system that it has made a good move or a poor one. Therefore, the primary
task of reinforcement learning is to obtain a successful function using these rewards.
What is Machine learning? What is a need for it? ( Raj Mishra )
Ans. The concept of machine learning as adaptive changes in a system that enable the system to do the same task (or tasks) drawn
from the same population with greater efficiency whenever the task (or tasks) have to repeated again.
There is an increased need of machine learning. as helps understanding and improves efficiency of human learning. Rapid
advancement in computer technology has enabled users across the world to Store and process large amount Of distributed data.
this stored data proves to of greater use if it can be analysed and transformed into useful information that can further used for
drawing inferences, making future predictions. helping in intelligent decision making. and other such applications. Because of
this. Machine learning has become an area in AI. One of the most goals of A1 is to enable the development of computers that can
be taught rather than programmed. This will help in discovering structures that are still unknown to humans. It is not possible to
derive complicated AI systems by hand. A process of dynamic updating should be in place for continuous incorporation of new
information. If a system is capable of learning new characteristics automatically, then there is tremendous of expanding its
domain or thereby reducing its brittleness simulation requires features such as knowledge acquisition, inference, updating or
refinement of knowledge
base. acquisition of heuristics, application of faster searches, etc. Thus, we can sum
up by saying that learning is an important aspect of intelligence. The two types of learning
In inductive machine learning methodology, required rules and patterns are extracted from massive
data sets. Hence, the major focus of machine learning research in this case is to extract information
from data automatically by computational and statistical methods. On the other hand, deductive
machine learning methodology involves deducing new knowledge from already existing knowledge. The
utility of machine learning in day-to-day life can gauged from the wide range of applications such as
medical diagnosis, detection of monetary frauds (e.g., involving credit cards), classification of DNA
sequences, bioinformatics, brain-machine interfaces, stock market analysis, natural language processing,
syntactic pattern recognition, object recognition, game playing, and so on. Computational analysis of
machine learning algorithms and their performance forms a branch of theoretical computer science
known as computational learning theory.
Explain Components of Learning System. (Bhupendra Yadav)
i. Learning Components
iii. Critic
v. Sensor
vi. Effectors
i. Learning Components
The basic purpose of learning Components is to make changes or improvements to the system depending on its performance.
In a learning system the performance elements performs the task of choosing the actions that need to be taken.
iii. Critic
The job of the critic is to inform the learning components regarding its performance with respect to fixed standard. Note that
critic could be either a human or an automated component.
Problem Generator is imperative to a learning system since it suggests problems or actios that would lead to generation of new
examples or experiences, which will aid in further training of the system.
Both these components are external to the system. The system receives information or data from sensor, while the output is
transmitted through effectors.
What are the Basic Learning Methods (mandar sawant )
Ans-
Following are the basic learning methods
Rote learning
Learning by taking advise
Learning by parameter adjustment
Learning by Macro-Operators
Learning by Analogy
1. Rote learning
Rote learning basically refers to the process of memorization. Hence, it requires saving knowledge so
that it can be utilized again whenever needed. Rote learning involves one-to-one mapping from inputs
to stored representation and is also known as learning by memorization; it uses association-based
storage and retrieval. Moreover, there is no repeated computation required; only an inference or a
query is necessary. Although memorization is a key requirement in learning and development of an
intelligent program, it can be a complicated subject. In spite of being a basic and simple process, rote
learning highlights some relevant issues pertaining to more complex learning concepts as described
below.
Organization: Knowledge should be stored in such a manner that accessing this stored knowledge is
faster than resorting to re-computation. Organization of knowledge is achieved by employing
techniques such as hashing, indexing, sorting, and so on.
Generalization: Since the number of stored objects can be quite large, we need to generalize some
information to make the problem manageable.
Stability of environment: The method of rote learning does not work very effectively in a rapidly
changing environment. In case there is a change in environment, the change has to be detected and
recorded exactly.
Rote learning should not become a cause of decrease in the efficiency of a system; therefore, we must
be able to decide whether it is worth storing a particular value in the first place.
Although the idea of learning in Al by taking advice was proposed by John McCarthy in 1950s, very few
attempts were made to create such systems till the late 1970s (McCarthy, 1959). Expert systems are
examples of this concept. The following are two basic approaches to advice taking:
Taking high-level and abstract advice and then converting it into rules.
Developing sophisticated modules
The first approach involves taking abstract advice and converting it into rules that can be used to guide
performance elements of the system. All aspects of advice taking are automated. The following steps
are required in this method:
4. Learning by Macro-Operators
Although the basic idea in learning by macro-operators is similar to rote learning, here we avoid
expensive re-computation by using macro-operators that are learnt for subsequent use. These operators
consist of series of stereotyped actions. The STRIPS problem-solving employed macro- operators in its
learning phase.
5. Learning by Analogy
Learning by analogy is based on the understanding that if a system can recognize similarities in
information that is already stored in it then it may be able to transfer some knowledge from this
previous information to improve the solution of the task in hand. Analogy involves a complicated
mapping between two dissimilar concepts or correspondence between two different representations. It
is easy for human beings to quickly recognize the abstractions involved and understand the meaning.
There are two types of analogical problem methods studied in Al
Unsupervised Learning -
•Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden
patterns or data groupings without the need for human intervention.
•Its ability to discover similarities and differences in information make it the ideal solution for
exploratory data analysis, cross-selling strategies, customer segmentation, and image
recognition.
•Unsupervised learning refers to the use of artificial intelligence (AI) algorithms to identify
patterns in data sets containing data points that are neither classified nor labeled.
•The algorithms are thus allowed to classify, label and/or group the data points contained within
the data sets without having any external guidance in performing that task.
•Unsupervised learning allows the system to identify patterns within data sets on its own.
•In unsupervised learning, an AI system will group unsorted information according to similarities
and differences even though there are no categories provided.
•Unsupervised learning algorithms can perform more complex processing tasks than supervised
learning systems.
•Unsupervised learning can be more unpredictable than a supervised learning model. While an
unsupervised learning AI system might, for example, figure out on its own how to sort cats
from dogs, it might also add unforeseen and undesired categories to deal with unusual breeds,
creating clutter instead of order.
•Unsupervised learning models are utilized for three main tasks—clustering, association, and
dimensionality reduction. Below we’ll define each learning method and highlight common
algorithms and approaches to conduct them effectively.
•Clustering is a data mining technique which groups unlabeled data based on their similarities or
differences. Clustering algorithms are used to process raw, unclassified data objects into
groups represented by structures or patterns in the information. Clustering algorithms can be
categorized into a few types, specifically exclusive, overlapping, hierarchical, and
probabilistic.
•Exclusive and Overlapping Clustering - Exclusive clustering is a form of grouping that
stipulates a data point can exist only in one cluster. This can also be referred to as “hard”
clustering. The K-means clustering algorithm is an example of exclusive clustering.
•K-means clustering is a common example of an exclusive clustering method where data points
are assigned into K groups, where K represents the number of clusters based on the distance
from each group’s centroid. The data points closest to a given centroid will be clustered
under the same category. A larger K value will be indicative of smaller groupings with more
granularity whereas a smaller K value will have larger groupings and less granularity. K-
means clustering is commonly used in market segmentation, document clustering, image
segmentation, and image compression.
•Overlapping clusters differs from exclusive clustering in that it allows data points to belong to
multiple clusters with separate degrees of membership. “Soft” or fuzzy k-means clustering is
an example of overlapping clustering.
•Euclidean distance is the most common metric used to calculate these distances; however, other
metrics, such as Manhattan distance, are also cited in clustering literature.
•Divisive clustering can be defined as the opposite of agglomerative clustering; instead it takes a
“top-down” approach. In this case, a single data cluster is divided based on the differences
between data points. Divisive clustering is not commonly used, but it is still worth noting in
the context of hierarchical clustering. These clustering processes are usually visualized using
a dendrogram, a tree-like diagram that documents the merging or splitting of data points at
each iteration.
Diagram of a Dendrogram; reading the chart "bottom-up" demonstrates agglomerative clustering
while "top-down" is indicative of divisive clustering
•Probabilistic clustering - A probabilistic model is an unsupervised technique that helps us solve
density estimation or “soft” clustering problems. In probabilistic clustering, data points are
clustered based on the likelihood that they belong to a particular distribution. The Gaussian
Mixture Model (GMM) is the one of the most commonly used probabilistic clustering
methods.
•Gaussian Mixture Models are classified as mixture models, which means that they are made
up of an unspecified number of probability distribution functions. GMMs are primarily
leveraged to determine which Gaussian, or normal, probability distribution a given data point
belongs to. If the mean or variance are known, then we can determine which distribution a
given data point belongs to. However, in GMMs, these variables are not known, so we
assume that a latent, or hidden, variable exists to cluster data points appropriately. While it is
not required to use the Expectation-Maximization (EM) algorithm, it is a commonly used to
estimate the assignment probabilities for a given data point to a particular data cluster.
•Autoencoders leverage neural networks to compress data and then recreate a new representation
of the original data’s input. Looking at the image below, you can see that the hidden layer
specifically acts as a bottleneck to compress the input layer prior to reconstructing within the
output layer. The stage from the input layer to the hidden layer is referred to as “encoding”
while the stage from the hidden layer to the output layer is known as “decoding.”
ANS:- Reinforcement learning (RL) is an area of machine learning concerned with how intelligent
agents ought to take actions in an environment in order to maximize the notion of cumulative
reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised
learning and unsupervised learning.
Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to
be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on
finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge)
list the examples in the form of a table ‘T’ where each row corresponds to an example and each column
contains an attribute value.
create a set of m training examples, each example composed of k attributes and a class attribute with n
possible decisions.
create a rule set, R, having the initial value false.
initially all rows in the table are unmarked.
Steps in the algorithm:-
Step 1: divide the table ‘T’ containing m examples into n sub-tables (t1, t2,…..tn). One table for each
possible value of the class attribute. (repeat steps 2-8 for each sub-table)
Step 2: Initialize the attribute combination count ‘ j ‘ = 1.
Step 3: For the sub-table on which work is going on, divide the attribute list into distinct combinations, each
combination with ‘j ‘ distinct attributes.
Step 4: For each combination of attributes, count the number of occurrences of attribute values that appear
under the same combination of attributes in unmarked rows of the sub-table under consideration, and at the
same time, not appears under the same combination of attributes of other sub-tables. Call the first
combination with the maximum number of occurrences the max-combination ‘ MAX’.
Step 5: If ‘MAX’ = = null , increase ‘ j ‘ by 1 and go to Step 3.
Step 6: Mark all rows of the sub-table where working, in which the values of ‘MAX’ appear, as classified.
Step 7: Add a rule (IF attribute = “XYZ” –> THEN decision is YES/ NO) to R whose left-hand side will have
attribute names of the ‘MAX’ with their values separated by AND, and its right-hand side contains the
decision attribute value associated with the sub-table.
Step 8: If all rows are marked as classified, then move on to process another sub-table and go to Step 2. else,
go to Step 4. If no sub-tables are available, exit with the set of rules obtained till then.
2. Inductive Bias:
-The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the
learner uses to predict outputs of given inputs that it has not encountered.
-Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the
inductive bias is a logical formula that, together with the training data, logically entails the hypothesis
generated by the learner. However, this strict formalism fails in many practical cases, where the inductive
bias can only be given as a rough description (e.g. in the case of artificial neural networks), or not at all.
What are the Techniques for Selecting Best Attribute? (Shivam Singh)
Ans :-
Deductive reasoning is deducing new information from logically related known
information. It is the form of valid reasoning, which means the argument's conclusion must be
true when the premises are true.
Deductive reasoning is a type of propositional logic in AI, and it requires various rules and facts.
It is sometimes referred to as top-down reasoning, and contradictory to inductive reasoning.
Deductive reasoning mostly starts from the general premises to the specific conclusion, which
can be explained as below example.
Example:-
Premise: All of the pigeons we have seen in the zoo are white.
Conclusion: Therefore, we can expect all the pigeons to be white.
Ans :-
Clustering: no predefined classification is required. The task is to learn a classification from
the data. Clustering algorithms divide a data set into natural groups (clusters). Instances in the
same cluster are similar to each other, the y share certain properties.
Clustering algorithms can have different properties:-
Hierarchical or flat: hierarchical algorithms induce a hierarch y of clusters of decreasing
generality, for flat algorithms, all clusters are the same.
Iterative: the algorithm starts with initial set of clusters and improves them by reassigning
instances to clusters.
Hard and soft: hard clustering assigns each instance to exactly one cluster. Soft clustering assigns
each instance a probability of belonging to a cluster.
Disjunctive: instances can be part of more than one cluster.
2)Overlapping clustering : In overlapping clustering , fuzzy sets are used to cluster objected , So that each point may belongs to
two or more cluster with different degrees of membership . In this case an objected is associated with an appropriate membership
value .
3)Hierarchical clustering : This type of clustering is based on the union between two nearest clusters . The beginning condition
is realized by setting every object as a cluster . After a few iterations it reaches the final clusters.
Explain K-means, Fuzzy C-means and Hierarchical Clustering. ( Mohd Sameer Khan)
Ans
K-means Clustering :
K-means is one of the simplest unsupervised learning algorithms that can be used to solve the well-
known clustering problem (MacQueen, 1967). The procedure follows fairly simple approach of
classifying a given object set through a certain number of clusters, Say K (fixed a priori). If we assume
that there are n data points, then K-means the algorithm broadly is stated as follows:
• Place K points into the space represented by the objects. These points represent initial group
centroids. . Repeat until the centroids do not move •Assign each object to the group that has the closest
centroid. • When all objects have been assigned, recalculate the positions of the K centroids.
This algorithm causes a separation of objects into groups from which the metric to be minimized can be
calculated. Finally, the aim of this algorithm is to minimize an objective function. In this case, an
objective function F is the squared error function and it may be defined as follows: Let xji be the ith data
point in cluster j and cj be the cluster centre, then we can write,
Fuzzy C-means Fuzzy C-means (FCM) is a method of clustering in which an object is allowed to belong to
two of more clusters. This method was initially developed by Dunn in the year 1973 and is frequently
used in applications such as pattern recognition. Let us assume u., to represent the degree of
membership of x; (ith data) in the cluster j, and c, to be the center of the cluster j. FCM algorithm is
broadly described as follows: Here n represents input data and c represents number of clusters.
Hierarchical Clustering
As already defined earlier, hierarchical clustering is a type of clustering that is based on the union of two
nearest clusters. In this clustering, the starting condition is realized by setting every object as a cluster.
After a few iterations, it reaches the final clusters. Given a set of n objects to be clustered, and an n*n
distance or similarity matrix, the basic process of hierarchical clustering defined by S.C. Johnson
(Johnson, 1967) is as follows:
What are the different Methods To Find The Closest Pair Of Clusters? ( Devendra Katpara)
ANS – There are number of methods that can be used for finding for closet pair of cluster such as
Single-linkage clustering
Complete-linkage clusterin
Average-linkage clustering
These are described briefly as follows:
1. In single-linkage clustering, we consider the distance between one cluster and another
cluster to be the shortest distance from any member of one cluster to any member of the other
cluster. If the data consist of similarities, we consider the similarity between one cluster and
another cluster to be equal to the greatest similarity from any member of one cluster to any
member of the other
cluster.
2. In complete-linkage clustering, we consider the distance between one cluster and another
cluster to be the greatest distance from any member of one cluster to any member of the other
cluster.
3. In average-linkage clustering, we consider the distance between one cluster and another
cluster to be the average distance from any member of one cluster to any member of the other
cluster. Clustering algorithms find great applications in a large number of fields, such as
marketing (which involves finding groups of customers who possess similar marketing behaviour
from a given data- base containing customer information and past buying patterns), biology
(which involves classifying a given database of plants and animals into genus, families, orders,
classes, kingdoms, and so on), insurance (which identifying groups of motor insurance policy
holders with a high average claim cost and also identifying frauds), residential development
(which involves classify- ing groups of houses in terms of their type, value, and location), and so
on.
Write Application Of Clustering Algorithm (Omkar Jadhav)
Its memory efficient as it uses a subset of training points in the decision function called
support vectors
Different kernel functions can be specified for the decision functions and its possible to
specify custom kernels
-Linearly separable:
5. Let’s consider two independent variables x1, x2 and one dependent variable which is either a blue circle or a
red circle.
From the figure above its very clear that there are multiple lines (our hyperplane here is a line because we
are considering only two input features x1, x2) that segregates our data points or does a classification
between red and blue circles
Selecting the best hyper-plane:
6. One reasonable choice as the best hyperplane is the one that represents the largest separation or margin
between the two classes.
So we choose the hyperplane whose distance from it to the nearest data point on
each side is maximized. If such a hyperplane exists it is known as the maximum-
margin hyperplane/hard margin. So from the above figure, we choose L2.
7. Let’s consider a scenario like shown below
The blue ball in the boundary of red ones is an outlier of blue balls.
The SVM algorithm has the characteristics to ignore the outlier and
finds the best hyperplane that maximizes the margin. SVM is robust to
outliers.
It finds maximum margin as done with previous data sets along with that
it adds a penalty each time a point crosses the margin. So the margins
in these type of cases are called soft margin.
SVM solves this by creating a new variable using a kernel. We call a point xi on the
line and we create a new variable yi as a function of distance from origin
For right hand side image given above: the new variable y is created as a function of
distance from the origin. A non-linear function that creates a new variable is
referred to as kernel.
9. The SVM kernel is a function that takes low dimensional input space and
transforms it into higher-dimensional space, ie it converts non
separable problem to separable problem.
10. It is mostly useful in non-linear separation problems. Simply put the
kernel, it does some extremely complex data transformations then finds
out the process to separate the data based on the labels or outputs
defined.
What is regression? Explain in short Support Vector Regression. (Harsh Shukla)
Ans)
Case-based reasoning (CBR) is a recent and very useful approach to problem solving and learning
paradigm in AI. It is different from other AI approaches as CBR can utilize specific knowledge from
previously experienced, concrete problem situations. It does not rely solely on general knowledge of a
problem domain or make associations along generalized relationships between problem descriptors and
conclusions.
There is an incremental and sustained learning in CBR where a new experience is saved every time a
problem is solved so that this solution becomes immediately available for future use. CBR approach is
that whenever we are faced with a new problem, our first instinct is to look at solutions that have worked
for similar problems in the past. Thus, the case-based reasoner solves new problems by adapting solutions
that were used to solve old problems or by remembering a previous similar situation and by reusing
information and knowledge of that situation.
For example, suppose while driving through a particular road, we come across a traffic jam; if we have
faced a similar situation in the past and remember a route that we may have taken to get out of this jam,
the same route may be used gain. Or else, we can consider an experimental route and if we are successful
in our attempt, we may remember this route for similar circumstances in the future.
Another example of case-based reasoning approach is in the case of medical situations where physicians
use their past experiences for diagnosis and treatment of patients.
Case-based reasoning is a sub-field of machine learning since the basic feature of CBR is learning. It
represents a machine learning paradigm that enables continual learning by updating the case base after
obtaining a solution to each problem. Therefore, learning naturally follows problem solving in CBR.
After a particular problem has been solved successfully, the solution is retained for future reference. If a
problem cannot be solved, then the reason for the failure is identified and recorded so that the same
mistake should be avoided in the future.
What are Steps for Learning in CBR? ( Sanket Patil )
Ans : Case-based reasoning (CBR) is an experience-based approach to solving new problems by adapting previously
successful solutions to similar problems. Addressing memory, learning, planning and problem solving, CBR provides a
foundation for a new technology of intelligent computer systems that can solve problems and adapt to new situations. In
CBR, the “intelligent” reuse of knowledge from already-solved problems, or cases, relies on the premise that the more
similar two problems are, the more similar their solutions will be.
Case-based reasoning is considered to be a sub-field of machine learning since the basic underly ing feature of CBR is
learning. Moreover, CBR does not just depict a particular reasoning method Rather, regardless of how the cases are
acquired, it represents a machine learning paradigm that enables continual learning by updating the case base after
obtaining a solution to each problem Therefore, learning naturally follows problem solving in CBR. After a particular
problem has been solved successfully, the solution is retained for future reference. If a problem cannot be solved, then
the reason for the failure is identified and recorded so that the same mistake should be avoided in the future. The
following steps need to be followed for effective learning using CBR approach
The central tasks that all case-based reasoning methods have to deal with
are to
I. identify the current problem situation
II. find a past case similar to the new one,
III. use that case to suggest a solution to the current problem,
IV. evaluate the proposed solution, and
V. update the system by learning from this experience.
CBR methods. CBR methods and systems have two main models:
A process model of the CBR cycle
A task-method structure for case-based reasoning
Both the models mentioned above are complementary to each other and
depict two different views of case-based reasoning. While the process
model of the CBR cycle is a dynamic model concerned with the main
sub processes of a CBR cycle, their interdependencies, whereas the
task-method structure offers a task-oriented view where task
decomposition and related problem-solving methods are dealt with. In
general, the CBR cycle may be regarded as a cyclical process
comprising the four REs as mentioned below:
RETRIEVE the most similar case(s);
REUSE the case(s) to attempt to solve the problem;
REVISE the proposed solution if necessary; and
RETAIN the new solution as a part of a new case.
Ans:
In this type of network, we have only two layers input layer and the output layer but the
input layer does not count because no computation is performed in this layer.
The output layer is formed when different weights are applied to input nodes and the
cumulative effect per node is taken.
After this, the neurons collectively give the output layer to compute the output signals.
Explain MULTI-LAYER FEED-FORWARD NETWORK. (Mohd Owais patni)
Ans:
Multilayer Feed-Forward Neural Network(MFFNN) is an interconnected Artificial Neural
Network with multiple layers that has neurons with weights associated with them and they
compute the result using activation functions. It is one of the types of Neural Networks in which
the flow of the network is from input to output units and it does have any loops, no feedback, and
no signal moves in backward directions that is from output to hidden and input layer.
The ANN is a self-learning network that learns from sample data sets and signals, it is based on
the function of the biological nervous system. The type of activation function depends on the
desired output. It is a part of machine learning and AI, which are the fastest growing fields, and
lots of research is going on to make it more effective.
The Architecture of the Multilayer Feed-Forward Neural Network:
This Neural Network or Artificial Neural Network has multiple hidden layers that make it a
multilayer neural Network and it is feed-forward because it is a network that follows a top-down
approach to train the network. In this network there are the following layers:
Input Layer: It is starting layer of the network that has a weight associated with the signals.
Hidden Layer: This layer lies after the input layer and contains multiple neurons that perform
all computations and pass the result to the output unit.
Output Layer: It is a layer that contains output units or neurons and receives processed data
from the hidden layer, if there are further hidden layers connected to it then it passes the
weighted unit to the connected hidden layer for further processing to get the desired result.
Ans:-
A feed-forward network represents an acyclic (with no cycles) network since data can pass from input to the output nodes but not vice versa.
Once the FFNN is trained, its state gets fixed and does not modify when new data is presented to it, and it has no memory. These shortcomings of
feed- forward networks are resolved by another type of network called recurrent network. These networks can have connections going back from
output to input nodes and, in fact, can have arbitrary connections between any nodes. In addition, an internal state of recurrent networks can be
modified as new sets of input data are presented. It also possesses a memory, which proves to be useful while solving problems where the
solution depends on all previous inputs and not just on the current inputs. For example, prediction of stock market price, weather forecast
prediction, etc., are all problems that require a network with the features described for a recurrent network. Learning in a recurrent network
involves feeding inputs through the network, which includes feeding data back from outputs to inputs. The process of feeding back is repeated
until the values of the outputs stop changing. This state is called equilibrium or stability. Figure 12.11 shows a recurrent network with hidden
neuron that models a dynamic system using a unit delay operator d .
Recurrent networks can be trained by using back-propagation algorithm. In this method, at each
step, the activation of the output is compared with the desired activation and errors are
propagated backward through the network. Once this training process is completed, the network
becomes capable of performing a sequence of actions.
What are the Design Issues of Artificial Neural Networks? (Shlok shivkar)
Ans: A feed-forward network represents an acyclic (with no cycles) network since data can pass from input to the output nodes but not vice
versa. Once the FFNN is trained, its state gets fixed and does not modify when new data is presented to it, and it has no memory. These
shortcomings of feed- forward networks are resolved by another type of network called recurrent network. These networks can have connections
going back from output to input nodes and, in fact, can have arbitrary connections between any nodes. In addition, an internal state of recurrent
networks can be modified as new sets of input data are presented. It also possesses a memory, which proves to be useful while solving problems
where the solution depends on all previous inputs and not just on the current inputs. For example, prediction of stock market price, weather
forecast prediction, etc., are all problems that require a network with the features described for a recurrent network. Learning in a recurrent
network involves feeding inputs through the network, which includes feeding data back from outputs to inputs. The process of feeding back is
repeated until the values of the outputs stop changing. This state is called equilibrium or stability. Figure 12.11 shows a recurrent network with
hidden neuron that models a dynamic system using a unit delay operator d .
Recurrent networks can be trained by using back-propagation algorithm. In this method, at each
step, the activation of the output is compared with the desired activation and errors are
propagated backward through the network. Once this training process is completed, the network
becomes capable of performing a sequence of actions.
provides quick and cost effective solution to very complex problems for
you can consider an example where you can see the evolution changes for
a specific species like the human nervous system and behavior of an Ant’s,
It does not require any mathematical modeling for solving any given
problem
Adaptive in nature.
There are three types of soft computing techniques which include the
following:
neural system. The main advantage is that they solve the problems in
-Fuzzy Logic: The fuzzy logic algorithm is used to solve the models which
Fuzzy logic provides stipulated truth value with the closed interval [0,1].
-Genetic algorithm: They are usually used for optimization problems like
Selection :
The idea of selection phase is to select the fittest individuals and let them
pass their genes to the next generation.
Two pairs of individuals (parents) are selected based on their fitness scores.
Individuals with high fitness have more chance to be selected for reproduction
Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of
parents to be mated, a crossover point is chosen at random from within the
genes. For example, consider the crossover point to be 3 as shown below.
Offspring are created by exchanging the genes of parents among themselves until
the crossover point is reached.
Mutation
In certain new offspring formed, some of their genes can be subjected to a
mutation with a low random probability. This implies that some of the bits in the
bit string can be flipped. Mutation occurs to maintain diversity within the
population and prevent premature convergence.
Ans: Evolutionary programming is a more complex form of genetic programming in which the individuals are structures with
greater degrees of complexity. It constitutes one of the major evolutionary algorithm paradigms. Evolutionary programming
was invented by Dr. Lawrence J. Fogel in the year 1960 to use simulated evolution as a learning process with the aim of
generating intelligent behaviour. Although he did not model the end product of evolution, he did try to model the process of
evolution itself as a mechanism for producing intelligent behaviour. Fogel used finite state machines (FSMs) as predictors and
evolved them in the following manner. He described this process as evolutionary programming in contrast to heuristic
programming.
Evolutionary programming is considered to be a wide evolutionary computing with no fixed structure or representation. It is
difficult to distinguish between evolutionary programming and evolutionary strategies. Some of its original variants closely
resemble to later genetic program- ming, with the difference that the program structure is fixed and numerical parameters are
allowed to evolve.
The crucial point that distinguishes evolutionary programming from genetic algorithm is the man- ner in which new solutions or
offsprings are generated in both. While in GA, a new solution is formed as a result of crossover of two solutions, in evolutionary
programming, each member of the population generates an offspring by the process of mutation. Evolutionary programming is
better for obtaining the global optimum since it relies on mutation rather than crossover. Because of the inherent flexibility in
the fitness function, evolutionary programming method leads to the best solution with fewer generations.
The basic steps involved in using an evolutionary programming method to determine a globally optimal solution are outlined in
the following subsection. Similar to other computational proce- dures discussed so far, such as genetic algorithms, evolutionary
programming is also a methodology and not an algorithm. Many parameters need to be taken care of before using this method
5. What is swarm intelligence? Name two swarm intelligence systems. [Shraddha Kasar][21306A1011]
Ans:- A swarm is defined as a set of (mobile) agents that are capable of communicating directly or indirectly (by acting on their
local environment) with each other. They can carry out distributed problem solving in a collective manner with the help of
extremely simple rules. Therefore, SI can said to be based on the collective behaviour of self-organized and decentralized
systems. Even though there is no common centralized control structure that defines the behaviour of individual agents,
interactions between such agents lead to the emergence of intelligent global behaviour, which is unknown to the individual
agent.
Swarms are more powerful than single individuals since they can achieve goals that individuals may not be able to achieve.
Swarm intelligence contains four basic steps, namely, positive feedback, negative feedback, amplification of fluctuations, and
multiple interaction.
Ant Colony Optimization: - Ant colony optimization (ACO) is a class of optimization algorithms that are modelled on the xtions
of the members of an ant colony. Artificial ants or simulation agents locate optimal solutions by moving through a search space
representing all possible solutions. While studying the behaviour of real ants, we had stated that they lay down pheromones
directing each other to food d other resources while exploring their environment. In a similar manner the artificial ants cord
their positions and the quality of the solutions located by them so that other ants can locate better solutions in later simulation
iterations.
Particle Swarm Optimization: - Particle swarm optimization (PSO) is a global optimization algorithm that is used for solving
problems in which the best solution can be represented as a point or surface in an n-dimensional All hypotheses are plotted in
this space and provided with an initial velocity along with a communication channel between particles. As the particles move
through this solution space, they are evaluated according to some fitness criterion after each time step. With the passage of
time particles are seen to accelerate towards those particles within their communication grouping which have better fitness
values.
Evolutionary algorithms in real-life problems. Similar to swarm intelligence algorithms [6], a major reason is a growing
demand for smart optimization methods in many business and engineering activities. EAs are suitable mainly for
also inspired models that can be used to solve particularly difficult optimization problems. One of the most important aspects
of ant behavior is their ability to find the shortest paths. This has motivated computer scientists to develop algo for solving
shortest path and optimization probnd know as the flied of ant colony optimization (ACO). This is the most successful and
widely recognized algo technique based on ant behavior. This paradigm emerged by resrch on real ant behavior. As mentioned
earlier ants can deduce the shortest path from their colony to the src of food by leaving traces of chemical substances called
pheromones as trail for other ants to follow. From many such trails left by the predecessors an ant chooses that trail which has
max amount of pheromone deposit. The ant then traverses the chosen path and leaves its own pheromone as trail for others
behind it. This is an autocatalytic process which favor path along which more ants have previously traversed.
SOLUTION:
What is an Agent?
9. (Sushma - 21306A1042)
Ans :
Artificial intelligence is defined as the study of rational agents. A rational agent
could be anything that makes decisions, as a person, firm, machine, or software. It
carries out an action with the best outcome after considering past and current
percepts(agent’s perceptual inputs at a given instance). An AI system is composed
of an agent and its environment. The agents act in their environment. The
environment may contain other agents.
An agent is anything that can be viewed as :
perceiving its environment through sensors and
acting upon that environment through actuators
Examples of Agent:
A software agent has Keystrokes, file contents, received network packages
which act as sensors and displays on the screen, files, sent network packets
acting as actuators.
A Human-agent has eyes, ears, and other organs which act as sensors, and
hands, legs, mouth, and other body parts acting as actuators.
A Robotic agent has Cameras and infrared range finders which act as
sensors and various motors acting as actuators.
Goal-based agents
These kinds of agents take decisions based on how far they are currently from
their goal (description of desirable situations). Their every action is intended to
reduce its distance from the goal. This allows the agent a way to choose among
multiple possibilities, selecting the one which reaches a goal state. The knowledge
that supports its decisions is represented explicitly and can be modified, which
makes these agents more flexible. They usually require search and planning. The
goal-based agent’s behavior can easily be changed.
Utility-based agents
The agents which are developed having their end uses as building blocks are
called utility-based agents. When there are multiple possible alternatives, then to
decide which one is best, utility-based agents are used. They choose actions
based on a preference (utility) for each state. Sometimes achieving the desired
goal is not enough. We may look for a quicker, safer, cheaper trip to reach a
destination. Agent happiness should be taken into consideration. Utility describes
how “happy” the agent is. Because of the uncertainty in the world, a utility agent
chooses the action that maximizes the expected utility. A utility function maps a
state onto a real number which describes the associated degree of happiness.
Learning Agent:
A learning agent in AI is the type of agent that can learn from its past experiences,
or it has learning capabilities. It starts to act with basic knowledge and then can
act and adapt automatically through learning.
A learning agent has mainly four conceptual components, which are:
Learning element: It is responsible for making improvements by learning
from the environment
Critic: The learning element takes feedback from critics which describes
how well the agent is doing with respect to a fixed performance standard.
Performance element: It is responsible for selecting external action
Problem Generator: This component is responsible for suggesting actions
that will lead to new and informative experiences.
12.Explain agent environment in detail. (Pooja Sakrulla 21306A1031)
Answer:- Environments for an Agent
Environments can be classified into the following categories:-
Deterministic and non-deterministic: A deterministic environment is defined as
that in which every action has a single guaranteed effect and there does not exist
any uncertainty about the state that will result from performing an action. On the
other hand, in a non-deterministic environment, there may be many effects
corresponding to a single action. All intents and purposes in physical world can be
regarded as non-deterministic. These environments present greater problems for
the agent designer than the deterministic ones.
Static and dynamic: A static environment is assumed to remain unchanged except
by the performance of actions by the agent. On the other hand, in a dynamic
environment other processes operate on it, and hence change it beyond the
agent's control. The physical World has a highly dynamic environment
Discrete and continuous: An environment is said to be discrete if there are a
fixed. finite number of actions and percepts in it. Russell and Norvig gave a chess
game as an example of a discrete environment, and taxi driving as an example of
a continuous one. If an environment is sufficiently complex, then the
deterministic is not of much help.
13.Write a short note on the working of an agent. Hiral Patel (21306A1072)
Ans.
1. An agent generally maps its internals state to its data structures, the
operations which may be performed on these data structures, and the
control flow between these data structures.
2. One of challenging goals to design an agent program is to implement the
mapping the mapping from percepts to actions.
3. The agent takes sensor input from the environment and produces actions
that affect it as output.
4. The agent starts in some initial internal state, observes its environment
states, and then generates a percept.
5. Based on the percepts, the action is then performed, and the agents enters
another cycle, of perceiving the word via perception, updating its state, and
choosing an action to perform.
Logic-based Architecture
It was believed initially that intelligent systems that possess intelligent behaviour
can be generated by representing its environment and desired behaviour using a
symbolic/logical representation and manipulating this representation to derive
logical deduction, or proving theorems.
Logic-based approaches for agents are viewed as decision making by deduction. A
decision-making strategy of agent is encoded as a logical theory, and the process
of selecting an action reduces to a problem of proof in logic. Logic-based
approaches are elegant, and have a clean semantics.
Let us now discuss how to develop a simple model of logic-based agents, which
we call deliberate agents. In such agents, the internal state is assumed to be a
database of first-order predicate logic formulae.
Reactive Architecture
The researchers started investigating alternatives to logic-based approaches in
the mid to late 980s. The subsumption architecture is known to be the best for
reactive agent and was developed by Brooks, who was one of the strong critics of
the symbolic approach (Brooks R., 1991).
Subsumption has been widely influential in autonomous robotics and in real-time
AI systems.
Subsumption architecture is a way of decomposing complicated intelligent
behaviour into many simple self-contained component, which are in turn
organized into layers. Each layer implements a particular goal of the agent, and
higher layers are increasingly more abstract. The goal of each layer subsumes that
of the underlying layers.
The subsumption architecture has two main characteristics: The first is that
decision making of an agent is realized through a set of behaviours accomplishing
tasksHigher layers represent more abstract behaviours.
However, there are certain disadvantages of this model as well. Since the goals
might begin interfering with each other, there is a difficulty of designing action
selection through highly distributed system of inhibition and suppression.
Further, in this architecture, there is a low flexibility at runtime.
Belief-Desire-Intention Architecture
ln belief-desire-intention (BDD theory, the behaviour of an agent is described in
terms of a processing cycle. The processing cycle is a control mechanism that may
be achieved by software feedback mechanism for performing functions without
direct external intervention. A feedback mechanism can continuously monitor the
output of the system and compare the result against preset values and feeds the
difference back to adjust the behaviour of the target system in a processing cycle.
BDI architectures have a reasoning process that helps in deciding an appropriate
action to be performed for achieving goals based on belief and intentions. These
are practical reasoning architectures, in which the process reasoning resembles
human reasoning. Belief is about understanding the problem and environment in
that context. The decision process involves typically understanding and
generating various options available to the agent, on the basis of belief and
chooses between them, and commits to some. These chosen options become
intentions for agent to determine their actions. Intentions are fed back into the
agent's future practical reasoning.
An agent should review and reconsider its intentions from time to time as it might
have to drop certain intentions because of some reasons. For example, some of
the reasons might be that the belief of the agent has changed such that a
particular intention is no longer relevant now, or intention can never be achieved,
or it has already been achieved, etc. But reconsideration increases the cost
associated with it in terms of both time and computational resources.
So, different types of environment require different types of decision strategies.
In static environments, purely pro-active, goal-directed behaviour of an agent is
adequate. On the other hand, in more dynamic environments, the ability of an
agent to react to changes by modifying intentions is more important.
The basic components of BDI architecture are data structures representing
beliefs, desires, and intentions of the agent, and functions that represent its
deliberation for deciding what to do. The main components of BDI agent are as
follows: A set of current beliefs (denoted by B) of agent representing information
about current environment.
o A set of current beliefs (denoted by B) of agent representing information
about current environment.
o A set of current intentions (denoted by I) representing the current focus of
the agent.
o A set of current desires (options or goals, denoted by D) of the agent.
Generally, beliefs, desires, and intentions are represented as logical formulae. The
sets B, D, and should have consistency. For example, an intention to achieve X
should be consistent with the belief Y. The state of a BDI agent at any given
moment is represented as a triple (B, D, 1). In addition, the following functions
are used:
An action selection function (asf) determines an action to be performed on the
basis of current intentions i.e., asf(1) -> Action
o A belief revision function (brf) takes input as a percept and the current
beliefs of agent and produces a new set of beliefs i.e., brf(P, B) -› B
o A filter function (ff) takes agent's current beliefs, desires, and intentions
and determines the intentions of agent i.e., ff(B, D, 1) -> I.
o An option generation function (of) that determines the options available to
the agent on the basis of its current beliefs about its environment and its
current intentions as ogf(B, I) -> D.
o The deliberation process of a BDI agent is represented in the filter function.
It updates the agent's intentions on the basis of its previously-held
intentions and current beliefs and desires. This function must do the
following things. It should
o drop any intentions that are no longer achievable, or for becoming costlier
o retain intentions that are not achieved, and that are still expected to have a
positive overall benefit
o adopt new intentions, either to achieve existing intentions, or to exploit
new opportunities
Layered Architecture
In layered architectures, the various sub-systems of an agent are arranged into a
hierarchy of interacting layers. There will be at least two layers, to deal with
reactive and pro-active behaviors of agent, respectively. A useful typology for
such architectures is by the information and control flows within them. Broadly
there are two types of control flow within layered architectures (Wooldridge and
Jennings, 1995) namely horizontal layering and vertical layering.
Horizontal layering: Suppose each layer is capable of suggesting m possible
actions, then there are at most m*n such interactions to be considered. In order
to ensure that horizontally layered architecture is consistent, generally a mediator
is included, that makes decisions about which layer have control of the agent at
any given time. The introduction of a central control or mediator system
introduces a bottleneck into the agent's decision making and it is problematic also
as the designer must foresee all possible
interactions between layers.
Vertical layering: In vertically layered architecture, sensory input and action
output are dealt by at most one layer each. In this form of architecture, the
problems shown in horizontal architecture are partly solved. Vertical layered
architectures can be divided into one-pass architectures and two-pass
architectures. These are shown in Figure 14.6
o In one-pass architecture, control flows sequentially through each layer,
until the final layer generates action output.
o In two-pass architecture, information flows from percept through one layer
to another in first pass and control flows back from last layer to first layer
till action output in second pass.
The complexity of interactions between layers in both one pass and two pass
vertically layered architectures is reduced since there are n interactions to be
considered between layers. This is clearly much simpler than the horizontally
layered case. However, this simplicity comes at the cost of some flexibility. In
vertically layered architecture, for making a decision, control must pass between
each different layer. The failures in any one layer are likely to have serious
consequences for agent performance,
The relationships between concepts are called dependencies. Ille main conceptualization of a
clause is a two-way dependency between a PP (the actor) and an action. It is important to note
that actions are broken down into sequence of primitive ACT's.
A set of rules describe the syntax of the conceptual level, and these rules speci$' which type of
concepts can depend on which other type, as well as the different kinds of dependency
relation- ships between concepts. Specific concepts depending on other concepts based on the
particular meaning of these concepts is determined by the semantics of the conceptual level.
There exists a dictionary of ACTs which specifies different meanings with its conceptual
structure for each verb.
Example: The CD representation for sentences such as "I took a book from the man", "The man
gave me a book , and "The book was given by man to me" having the same intended meaning is
given as follows:
Here we notice that some special notations and symbols are used. These be explained as we
move ahead, but the conventions used throughout this chapter are as follows:
d — destination
p — Past
f — Future
t — Transition
ts — Start transition
tf — Finished transition
k — Continuing
c —Conditional
Rule 2: Rule representing relationship between an ACT and the PP (object) of the ACT is shown
by the direction of an arrow toward the ACT, since the context of the specific ACT determines
the meaning of the object relation.
Rule 3: This rule shows relationship both ways between two PPs. One PP belongs to the set
defined by the other PP. For example, John belongs to a set of doctors in the sentence "John is
a doctor".
Rule 4: It shows a relationship between two PPs. One of the PP provides a particular kind of
information about the other PP. The most common types of information to be encoded in these
ways are possession (shown as 'poss-by') and location (shown as 'loc'). The direction of the
arrow are possession (shown as 'poss-by') and location (shown as 'loc'). The direction of the
arrow is again, toward the concept being described.
Rule 5: It shows a relationship between a PP and a PA that is asserted to describe it. In this case,
PA represents the states of PP such as height, weight, health, etc. on numeric scales and has
both ways arrows.
Rule 6: It shows a relationship between a PP and an attribute that already has been predicated
of it. The direction of the arrow is toward the PP being described.
Rule 7: It shows a relationship between an ACT and its physical source and destination locations
of ACT. Here 'd' indicates the source and destination case relation, as this representation is also
used for recipient case relation later. Here we can use 'v' for vehicle which can be single object
or full conceptualization.
Rule 8: It shows a relationship between an ACT and its source and the recipient of ACT. The
letter 'r' indicates source and recipient case relation.
Rule 9: It shows a relationship between an ACT and the instrument using which it is performed.
In the simplest form, instrument can be represented by just a single physical object or must be
expanded to a full conceptualization. Here DO represents some act done by actor using
instrument. The letter 'i' represents instrument.
Rule 10: It shows a relationship that describes the change in state between PP and a state in
which it started and another state in which it ended. Here states of object are described using
numerical values. For example, for health, range could be - 10 to 10 describing various health
conditions {(dead, - 10), (seriously ill, - 9), (sick, [- 1, - 8]), (alright, 0), (fine, 5), (perfect heath,
10)}.
Rule 11: Relationship between one conceptualization and other conceptualization that causes
it. Here {x} and {y} represent two full conceptualizations where {y} is caused by {x}. It is
conditional representation as 'if x then y'. Alternatively, we can say that 'y' is a consequence of
'x'.
Rule 12: It shows a relationship between one conceptualization with another that is happening
at the time of the first. Here event 'y' is happening while event 'x' simultaneously.
Rule 13: It shows a relationship between conceptualization and the place at which it happened.
For ex : The verb ‘give’ is represented by an act ATRANS which transfers an object
from an actor to the recipient.
The verb ‘take’ is represented by an act ATRANS which transfers an object from
someone to an actor.
2. PTRANS : transfer of the physical location of an object (or actor) that requires an actor,
object, and direction
For ex : the verbs ‘go’ or ‘walk’ are type of PTRANS where an actor is moving himself to
some location
For ex : the verb ‘push’ is a PROPEL act that pushes an object in a direction by an actor.
The verb ‘throw’ is represented by an act PROPEL act where an object is moving
in a direction by an actor using MOVE action whose instrument is hand.
6. INGEST
7. MTRANS
8. MBUILD
9. EXPEL
10. SPEAK
11. ATTEND
It looks similar to frames, except that the values of the slots must be Ordered and have more
specialized roles. Script structure allows individual to make inferences needed for
understanding by filling in missing information.
In real world situations, we see that event tends to occur in known patterns because of clausal
relationship to the occurrence of events. So scripts are useful structure in such situations. A
number of computer programs have been developed to demonstrate the theory. Schank applied
his theoretical framework to story telling and the development of intelligent tutors. The classic
example of Schank's theory is the restaurant script.
Entry Conditions : Must be satisfied before events in the script can occur.
Track : Specific variation on more general pattern in the script. Different tracks may share many
components of the same script but not all.
Scenes: The sequence of events that occur. Events are represented in conceptual dependency
form.
Going to theater.
Buying ticket.
Going inside hall and sitting ona seat
Watching play
Exciting from theater
Entry Conditions:-
P wants to see a play
P has money
Results:
P saw play
P has less money
Props :
Tickets; Seat;Play
Roles:
Person
Ticket Distributor
Ticket checker
Tickets
Seat
Play
Roles :
Person (who wants to see a play)~P
Ticket distributor - TD
Ticket checker –TC
Conditions:
P wants to see a play
P has a money
Results:
P saw a play
P has less money
P is happy (optional if he liked the play)
Various Scenes
Scene 1: Going to theater
P PTRANS P into theater
PATTEND eyes to ticket counter
Scene 2: Buying ticket
P PTRANS P to ticket counter
P-MTRANS (need a ticket) to TD
TD ATRANS ticket to P
Scene 5: Exiting
P PTRANS P out of Hall and theater.
For example, if we have to represent "a circular from head to faculty for a meeting" in XML, it
will look like:
The circular is quite self-descriptive. It has sender and receiver information; it also has a
heading and a message body. But still, this XML document does not do anything except
information wrapped in N-defined tags. There must be a piece of software written to send,
receive, or display it.
XML provides a surface syntax for structured documents, but imposes no semantic constraints
on the meaning of these documents.
XML Schema :- It is a language for describing the structure of an XML documents, typically ex-
pressed in terms of constraints on the structure and content of documents. XML Schemas
express shared vocabularies and allow machines to carry out rules made by the people. They
provide Xans for defining the structure, content, and semantics of XML documents in more
detail. XML Schema was approved as a W3C Recommendation and was published in October
2004. The mechanism for associating an XML document with a schema varies according to the
schema language. The association may be achieved via markup within the XML document itself,
or via some external means. The purpose of an XML Schema is to define the valid building
blocks of an XML document, and it consists of following things:
• types for elements and attributes with default and fixed values for elements and attributes
Let us define simple XML document which describes a hook and the corresponding XML
schema. Here dotted lines are to be replaced by actual data values such as name tide of the
book, etc.
To write a schema for this document, we could simply follow its structure and define each
element. Every XML schema starts with element and then each term is defined one by one. The
following XML schema is just to give an idea.
XML schema is basically similar to database schema where we define structure consisting of
attributes with their corresponding types.
In the graphical representation given in Fig, 15.1, the class nodes are represented by oval-
shaped boxes, whereas the instances have been shown as rectangular boxes at the leaf nodes.
Each rectangular box contains the list of all instances of that class. Ontology is represented
using ontology language such as RDF Schema as explained earlier and OWL (Web Ontology
Language). Let us briefly describe OWL to give you a feel of ontology encoding language.
Consider ontology example of Educational Institution earlier, where a class 'Faculty' with a
property 'Instructor of with value as 'Student' is defined. We would add to the RDF document a
specification that
All the properties of class 'Faculty' will be directly inherited by the given resources. For
example, if every 'Faculty' has a Designation, then the object identified by cris> must also
have designation. Furthermore, suppose that ontology defines a property called "Student of" as
the inverse of "Instructor of " property, then without any extra effort, a Semantic Web engine
would be able to infer from inverse property that 'Mike is a Student of Cris", even though we
have only stated that "Cris is the Instructor of Mike" .
There are two ways of developing ontology, viz. either hand code the graphical structure of
ontology in language such as OWL or use some standard tool, if available. There is a PROTÉGÉ
tool that helps in creating the OWL format of the ontology structure. We will call such
ontology as OWL ontology. The sample code generated by PROTÉGÉ tool for Educational
Institution ontology given in Fig. 15.1
Target representation varies from one application to other. For example, in the case of
translation, target representation refers to target language; on the other hand, in case of
paragraph comprehension, target representation may refer to some form of semantic
representation using which one can answer various queries regarding the source text.
Basically, NLP involves processing of written text using computer models at lexical, syntactic,
and semantic level. It also includes processing of spoken language that uses all the processing
techniques required for written text plus knowledge about phonology with added information
to handle ambi- guities that arise in speech.
Morphological Analysis: Morphological analysis process (MAP) is carried out initially on the
natural language sentences; this method tries to the extract root word from declined or
inflectional form of word after removing suffices and prefixes. For example, getting the root
'push' from declined forms such as pushed, pushing, pushes, etc. In addition to this, it also
assigns appropriate syntactic categories such as noun, verb, adjective, etc., to all words in the
sentence.
Syntactic Analysis: This method of analysis uses the result of MAP to build a structural
description of the sentence based on grammatical rules. This process is called parsing. A
declarative representation (called grammar) of syntactic facts about the language and
procedure (called parser) compares the grammar against the input sentence to produce parse
structures. Creating a parse tree is the first step towards understanding a sentence.
Semantic Analysis: It creates a semantic structure by ascribing the literal meaning to a sentence
using parse structure obtained in syntactic phase. It maps individual words into corresponding
objects in the knowledge base and combines the words with each other using semantic rules.
Our aim here is to produce meaning in some suitable representation scheme by using any of KR
methods described in earlier chapter. Main purpose of semantic processing is the creation of
target representation of the meaning of sentence.
Pragmatics Analysis: It refers to intended meaning of a sentence used in different contexts. The
context affects the interpretation of the sentence. For example, in the sentence "John saw Mike
in the garden with a cat", there are two interpretations. First interpretation is that John was
having Cat and saw Mike in the garden. Other interpretation might be that John saw Mike (with
cat) in the garden. Seeing the context one can resolve to unique interpretation. If we have
knowledge about John that he keeps pets then first interpretation will be more suitable.
Discourse Analysis: It refers to Conversation between two or more individuals and the
interpretation is based on the belief set at the time of conversation. Here the interpretation of
spoken sentence will be based on the belief set of the people involved in conversation.
A large number of computational models for syntactic and semantic analysis have been
developed but Very little work has been done for developing pragmatic and discourse analysis
models since various complexities are involved in understanding them. Linguists have done
researches and developed theories but these are not feasible straightway for computational
purposes.
Production rules are defined using non-terminal (symbols to be further expanded) and terminal
symbols (direct symbols found in the language). Statistical parsing methods require large
corpora and linguistic knowledge is represented as statistical parameters or probabilities,
which may be used to parse a given sentence (Jurafsy D. & Martin J.H., 2000).
In this chapter, we will concentrate on rule-based parsers. Once the grammar rules are defined,
a sentence is parsed using the grammar and a tree kind of structure is built, if the sentence is
syntactically correct. This tree is called a parse tree. Parsing can be done using two methods:
top—down parsing and bottom—u parsing. A parser can use any of these parsing methods.
Each one has its merits and demerits (Allen J., 1994).
Bottom—up parsing: In bottom—up parsing, we start with the words in the sentence and apply
grammar rules in the backward direction until a single tree is produced whose root matches wi
the start symbol.
Top—down parsing: In top—down parsing, we start with the start symbol and apply grammar
rules in the forward direction until the terminal symbols of the parse tree correspond to the
words in the sentence.
Consider simple context-free-like grammar for English language as given in Table 16.1
conventions used in this chapter are as follows: