You are on page 1of 23

Artificial Intelligence

CS-451

Instructor : Syed Musharaf Ali

ROOM G-104-DSE IIUI


Ph# 051-9019724 Ext-2724
The Markov decision process (MDP) is a model of predicting outcomes. The
model attempts to predict an outcome given only information provided by the current
state. At each step during the process, the decision maker may choose to take an
action available in the current state, resulting in the model moving to the next step
and offering the decision maker a reward.
Q-learning finds an optimal policy for maximizing the expected value of the total
reward over any and all successive steps, starting from the current state to the goal
state. Q-learning can identify an optimal action-selection policy for any given Finite
MDP
Exploitation
Exploration

Best policy found when gamma close to one

You might also like