You are on page 1of 2

Assessment 1

Artificial Intelligence and Machine Learning

Question 1:

Reinforcement Learning (RL): is a type of machine learning paradigm where an agent


learns to make decisions by interacting with its environment. The agent receives
feedback in the form of rewards or punishments as it takes actions in an environment,
and its goal is to learn a policy (strategy) that maximizes the cumulative reward over
time. In RL, an agent learns through trial and error, adjusting its strategy based on the
outcomes of its actions.

Here is a simplified explanation of the Reinforcement Learning problem along with a


diagram:

Reinforcement Learning Problem:

1. Agent (Decision-Maker): The entity that takes actions in an environment.

2. Environment: The external system with which the agent interacts.

3. State (S): Represents the current situation or configuration of the environment.

4. Action (A):The decision or move made by the agent.

5. Reward (R): Immediate feedback from the environment after taking an action. It
indicates the desirability of the outcome.

6. Policy (π): The strategy or mapping from states to actions that the agent follows.

7. Value Function (V or Q):Represents the expected cumulative future reward for a


given state (V) or state-action pair (Q).

Reinforcement Learning Diagram:

+-------------------+ +--------------------------+
| | | |
| Agent | | Environment |
| | | |
| | A | |
| +-------+ +--------> |
| | State | | |
| +-------+ R | |
| <--------+ |
| | | |
+-------------------+ +--------------------------+
```

1. The agent observes the current state of the environment.


2. The agent selects an action based on its current policy.
3. The environment transitions to a new state, and the agent receives a reward.
4. The agent updates its policy and value function based on the received reward.
5. This process repeats over multiple iterations, with the agent learning to maximize
cumulative rewards.

Reinforcement learning problems often involve finding the optimal policy or value
function that leads to the best long-term outcomes for the agent in a given
environment. The iterative learning process allows the agent to adapt and improve its
decision-making strategy over time.m

You might also like