Module 4.3

Module 4.
3: Knowledge and Reasoning

1
Module: 4.3
Knowledge and Reasoning
Motivation:
Motivation of this module is to provide the students with the knowledge of uncertainty and
belief network.
Syllabus:
Lecture Content Duration Self-Study

no (Hrs)
(Hr)
1 Uncertain knowledge and reasoning: Uncertainty 1
Representing knowledge in an uncertain domain
1
2 The semantics of belief network 1
Inference in belief network
Markov Decision Process
Learning Objective:
Learner should know about the representation of knowledge and the reasoning processes
of uncertain and belief network.
Theoretical Background:
Key Definitions:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability always
remains between 0 and 1 that represent ideal uncertainties.
Conditional probability: It is a probability of occurring an event when another event has already
happened.
Course Content:
Lecture : 1
Uncertain knowledge and reasoning
Module 4.3: Knowledge and Reasoning
3
Let’s check the take away from this lecture
Exercise
Q.1 -------------------not the reason for causes of uncertainty to occur in the real world.
● Information occurred from unreliable sources.
● Experimental Errors
● Equipment fault
● Stable environment
Learning from this lecture: Learners will be able to understand need of probability reasoning.
5
Lecture : 2
The semantics and inference of belief network
7
9
11
13
Reinforcement Learning
o Reinforcement Learning is a feedback-based Machine learning technique in which an
agent learns to behave in an environment by performing the actions and seeing the
results of actions. For each good action, the agent gets positive feedback, and for
each bad action, the agent gets negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without
any labeled data, unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.
o RL solves a specific type of problem where decision making is sequential, and the
goal is long-term, such as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary goal
of an agent in reinforcement learning is to improve the performance by getting the
maximum positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it
learns to perform the task in a better way. Hence, we can say that "Reinforcement
learning is a type of machine learning method where an intelligent agent
(computer program) interacts with the environment and learns to act within
that." How a Robotic dog learns the movement of his arms is an example of
Reinforcement learning.
o It is a core part of Artificial intelligence, and all AI agent works on the concept of
reinforcement learning. Here we do not need to pre-program the agent, as it learns
from its own experience without any human intervention.
o Example: Suppose there is an AI agent present within a maze environment, and his
goal is to find the diamond. The agent interacts with the environment by performing
some actions, and based on those actions, the state of the agent gets changed, and it
also receives a reward or penalty as feedback.
15
o The agent continues doing these three things (take action, change state/remain in
the same state, and get feedback), and by doing these actions, he learns and
explores the environment.
o The agent learns that what actions lead to positive feedback or rewards and what
actions lead to negative feedback penalty. As a positive reward, the agent gets a
positive point, and as a penalty, it gets a negative point.
Markov Decision Process
Markov Decision Process or MDP, is used to formalize the reinforcement learning problems.
If the environment is completely observable, then its dynamic can be modeled as a Markov
Process. In MDP, the agent constantly interacts with the environment and performs actions;
at each action, the environment responds and generates a new state.
MDP is used to describe the environment for the RL, and almost all the RL problem can be
formalized using MDP.
MDP contains a tuple of four elements (S, A, Pa, Ra):
A set of finite States S
A set of finite Actions A
Rewards received after transitioning from state S to state S', due to action a.
Probability Pa.
MDP uses Markov property, and to better understand the MDP, we need to learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs an action a1 and move to
the state s2, then the state transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards, or states."
Or, in other words, as per Markov Property, the current state transition does not depend on
any past action or state. Hence, MDP is an RL problem that satisfies the Markov property.
Such as in a Chess game, the players only focus on the current state and do not need to
remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we
consider only the finite MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of random states S1, S2, .....,
St that uses the Markov Property. Markov process is also known as Markov chain, which is
a tuple (S, P) on state S and transition function P. These two components (S and P) can
define the dynamics of the system.
Let’s check the take away from this lecture
Exercise
Q.2 A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using -----------------------------.
(1) A directed acyclic graph
(2) A undirected acyclic graph.
(3) A directed cyclic graph
(4) A undirected cyclic graph.
Learning from this lecture: Learners will be able to understand prepositional Logic and First order
logic.
Conclusion
This chapter was introduction to knowledge base agents, Prepositional and Predicate logic required
for reasoning.
17
Short Answer Questions:
Q.1 Describe Joint probability distribution. (U)

Q.2 Discuss the ways to understand the semantics of the Bayesian network (U)
Long Answer Questions:
Q.1 Read the scenario (A)

Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds
at detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David
and Sophia, who have taken a responsibility to inform Harry at work when they hear the alarm.
David always calls Harry when he hears the alarm, but sometimes he got confused with the
phone ringing and calls at that time too. On the other hand, Sophia likes to listen to high music,
so sometimes she misses to hear the alarm. Here we would like to compute the probability of
Burglary Alarm.
Calculate the probability that alarm has sounded, but there is neither a burglary, nor
an earthquake occurred, and David and Sophia both called the Harry.
References:
Books:
Title Authors Publisher Edition Year Chapter
No
1 Artificial Stuart J. Russell and Peter McGraw 3rd
Intelligence a Norvig Hill Edition
Modern 2009
Approach
2 A First Course Deepak Khemani McGraw 1st
in Artificial Hill Edition
Intelligence Education 2013
(India)
Online Resources:
● https://nptel.ac.in/courses/106/102/106102220/
● https://www.javatpoint.com/history-of-artificial-intelligence
● http://people.eecs.berkeley.edu/~russell/slides/

Module 4.3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 4.3

Uploaded by

Copyright:

Available Formats

Module 4.

3: Knowledge and Reasoning

Lecture Content Duration Self-Study

Markov Decision Process

MDP contains a tuple of four elements (S, A, Pa, Ra):

A set of finite States S

A set of finite Actions A

Let’s check the take away from this lecture

Short Answer Questions:

Q.1 Describe Joint probability distribution. (U)

Long Answer Questions:

Q.1 Read the scenario (A)

You might also like