This is a monograph at the forefront of research on reinforcement learning, also referred to by other
names such as approximate dynamic programming and neuro-dynamic programming. It focuses on
the fundamental idea of policy iteration, i.c., start from some policy, and successively generate one or
more improved policies. IF just one improved policy is generated, this is called rollout, which, based
on broad and consistent computational experience, appears to be one of the most versatile and reliable
of all reinforcement learning methods.
Approximate policy iteration is more ambitious than rollout, but it is a strictly off-line method, and it
is generally far more computationally intensive. www'This motivates the use of parallel and distributed
computation. One of the purposes of the monograph is to discuss distributed (possibly asynchronous)
methods that relate to rollout and policy iteration, both in the context of an exact and an approximate
implementation involving neural networks or other approximation architectures,
Among its special features, the book:
* Presents new research relating to distributed asynchronous computation, partitioned architectures,
and multiagent systems, with application to challenging large scale optimization problems, such as
combinatorial/discrete optimization, as well as partially observed Markov decision problems.
* Describes variants of rollout and policy iteration for problems with a multiagent structure, which
allow the dramatic reduction of the computational requirements for lookahead minimization.
* Establishes a connection of rollout with model predictive control, one of the most prominent
control system design methodology.
* Expands the coverage of some rescarch areas discussed in the author's 2019 textbook Reinforcement
Learning and Optimal Control
‘About the author
DIMITRI P. BERTSEKAS, a member of the U.S. National Academy of
Engineering, is Fulton Professor of Computational Decision Making
at Arizona State University, and McAfee Professor of Engineering at
Massachusetts Institute of Technology. Among others, he has received
the 2001 AAGC John R. Ragazzini Education Award, the 2009 IN-
FORMS Expository Writing Award, the 2014 AACC Richard Bellman
Heritage Award, the 2014 Khachiyan Prize, the 2015 SIAM/MOS.
George B. Dantzig award, and the 2018 INFORMS John von Neumann
Related Athena Scientific books of interest:
Reinforcement Learning and Optimal Control Visit Athena Scientific online at:
Dimitri P. Bertsekas, 2019 wwwathenasc.com
Abstract Dynamic Programming, 2nd Edition
Dimitri 2. Bertsekas, 2018
Dynamic Programming and Optimal Conirol, 4th
Edition, Dimitti P. Bertsekas, 2017
Stochastic Optimal Control: The Discrete-Time Case
Dimitri P. Bertsekas and Steven E. Shreve, 1996
sic Programming
Dimitri B. Bertsckas and John N. Tsitsiklis, 1996
ISBN-13: 978-1-886529-07-6