Professional Documents
Culture Documents
Module 5
Module 5
ii. Assume the immediate reward values are bounded; that is, there exists some
positive constant c such that for all states s and actions a, | r(s, a)| < c
iii. Assume the agent selects actions in such a fashion that it visits every
possible state-action pair infinitely often
Table 5.1. Values of ZN for two-sided N% confidence intervals
5.3.6. Two-Sided and One-Sided Bounds
Expectation maximization algorithm