Professional Documents
Culture Documents
AL and ML Assessment Week 8
AL and ML Assessment Week 8
Question 1:
5. Reward (R): Immediate feedback from the environment after taking an action. It
indicates the desirability of the outcome.
6. Policy (π): The strategy or mapping from states to actions that the agent follows.
+-------------------+ +--------------------------+
| | | |
| Agent | | Environment |
| | | |
| | A | |
| +-------+ +--------> |
| | State | | |
| +-------+ R | |
| <--------+ |
| | | |
+-------------------+ +--------------------------+
```
Reinforcement learning problems often involve finding the optimal policy or value
function that leads to the best long-term outcomes for the agent in a given
environment. The iterative learning process allows the agent to adapt and improve its
decision-making strategy over time.m