The Mountain Car Problem - Dymos

The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.
html
By The Dymos Development Team

© Copyright 2022.
The Mountain Car Problem Last updated on None.
Contents
• State and control variables
• Problem Definition
• Defining the ODE
• Solving the minimum-time mountain car problem with Dymos
• Plotting the solution
• Animating the Solution
• References
The mountain car problem proposes a vehicle stuck in a “well.” It lacks the power to
directly climb out of the well, but instead must accelerate repeatedly forwards and
backwards until it has achieved the energy necessary to exit the well.
The problem is a popular machine learning test case, though the methods in Dymos
are capable of solving it. It first appeared in the PhD thesis of Andrew Moore in 1990.
[Moo90]. The implementation here is based on that given by Melnikov, Makmal, and
Briegel [MMB14].
State and control variables

This system has two state variables, the position (x) and velocity (v) of the car.
This system has a single control variable (u), the effort put into moving. This control is
contrained to the range [−1 1].
The dynamics of the system are governed by
(65)
ẋ = v
v̇ = 0.001 ∗ u − 0.0025 ∗ cos(3x)
Problem Definition
We seek to minimize the time required to exit the well in the positive direction.
Minimize J = tf (66)
Subject to the initial conditions
(67)
x0 = −0.5
v0 = 0.0
the control constraints
|u| ≤ 1 (68)
and the terminal constraints
(69)
xf = 0.5
vf ≥ 0.0
Defining the ODE

The following code implements the equations of motion for the mountain car
problem.
A few things to note:
1. By providing the tag dymos.state_rate_source:{name} , we’re letting Dymos

know what states need to be integrated, there’s no need to specify a rate source
when using this ODE in our Phase.
2. Pairing the above tag with dymos.state_units:{units} means we don’t have to
specify units when setting properties for the state in our run script.
3. We only use compute_partials to override the values of ∂v̇
∂x because ∂v̇
∂u and ∂ẋ
∂v
are constant and their value is specified during setup .
1 von 4 14.04.2024, 12:16

The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.html
import numpy as np
import openmdao.api as om
class MountainCarODE(om.ExplicitComponent):
def initialize(self):
self.options.declare('num_nodes', types=int)
def setup(self):
nn = self.options['num_nodes']
self.add_input('x', shape=(nn,), units='m')

self.add_input('v', shape=(nn,), units='m/s')
self.add_input('u', shape=(nn,), units='unitless')
self.add_output('x_dot', shape=(nn,), units='m/s',

tags=['dymos.state_rate_source:x',
'dymos.state_units:m'])
self.add_output('v_dot', shape=(nn,), units='m/s**2',
tags=['dymos.state_rate_source:v',
'dymos.state_units:m/s'])
ar = np.arange(nn, dtype=int)
self.declare_partials(of='x_dot', wrt='v', rows=ar, cols=ar,

val=1.0)
self.declare_partials(of='v_dot', wrt='u', rows=ar, cols=ar,
val=0.001)
self.declare_partials(of='v_dot', wrt='x', rows=ar, cols=ar)
def compute(self, inputs, outputs):

x = inputs['x']
v = inputs['v']
u = inputs['u']
outputs['x_dot'] = v
outputs['v_dot'] = 0.001 * u - 0.0025 * np.cos(3*x)
def compute_partials(self, inputs, partials):

x = inputs['x']
partials['v_dot', 'x'] = 3 * 0.0025 * np.sin(3 * x)
Solving the minimum-time mountain car

problem with Dymos
The following script solves the minimum-time mountain car problem with Dymos.
Note that this example requires the IPOPT optimizer via the pyoptsparse package.
Scipy’s SLSQP optimizer is generally not capable of solving this problem.
To begin, import the packages we require:
import dymos as dm
import matplotlib.pyplot as plt
from matplotlib import animation
Next, we set two constants. U_MAX is the maximum allowable magnitude of the
acceleration. The references show this problem being solved with −1 ≤ u ≤ 1.
Variable NUM_SEG is the number of equally spaced polynomial segments into which
time is being divided. Within each of these segments, the time-history of each state
and control is being treated as a polynomial (we’re using the default order of 3).
# The maximum absolute value of the acceleration authority of the car

U_MAX = 1.0
# The number of segments into which the problem is discretized

NUM_SEG = 30
We then instantiate an OpenMDAO problem and set the optimizer and its options.
For IPOPT, setting option nlp_scaling_method to 'gradient-based' can substantially

improve the convergence of the optimizer without the need for us to set all of the
scaling manually.
The call to declare_coloring tells the optimizer to attempt to find a sparsity pattern
that minimizes the work required to compute the derivatives across the model.
#
# Initialize the Problem and the optimization driver
#
p = om.Problem()
p.driver = om.pyOptSparseDriver(optimizer='IPOPT')
p.driver.opt_settings['print_level'] = 0
p.driver.opt_settings['max_iter'] = 500
p.driver.opt_settings['mu_strategy'] = 'adaptive'
p.driver.opt_settings['bound_mult_init_method'] = 'mu-based'
p.driver.opt_settings['tol'] = 1.0E-8
p.driver.opt_settings['nlp_scaling_method'] = 'gradient-based' # for
faster convergence
p.driver.declare_coloring()
Next, we add a Dymos Trajectory group to the problem’s model and add a phase to it.
In this case we’re using the Radau pseudospectral transcription to solve the problem.
#
# Create a trajectory and add a phase to it
#
traj = p.model.add_subsystem('traj', dm.Trajectory())
tx = transcription=dm.Radau(num_segments=NUM_SEG)
phase = traj.add_phase('phase0', dm.Phase(ode_class=MountainCarODE,
transcription=tx))
At this point, we set the options on the main variables used in a Dymos phase.
In addition to time , we have two states ( x and v ) and a single control ( u ).
There are no parameters and no polynomial controls. We could have tried to use a
polynomial control here, but as we will see the solution contains large discontinuities
in the control value, which make it ill-suited for a polynomial control. Polynomial
controls are modeled as a single (typically low-order) polynomial across the entire
phase.
2 von 4 14.04.2024, 12:16

We’re fixing the initial time and states to whatever values we provide before executing
the problem. We will constrain the final values with nonlinear constraints in the next
step.
The scaler values ( ref ) are all set to 1 here. We’re using IPOPT’s gradient-based
scaling option and will let it work the scaling out for us.
Bounds on time duration are guesses, and the bounds on the states and controls
come from the implementation in the references.
Also, we don’t need to specify targets for any of the variables here because their
names are the targets in the top-level of the model. The rate source and units for the
states are obtained from the tags in the ODE component we previously defined.
#
# Set the variables
#
phase.set_time_options(fix_initial=True, duration_bounds=(.05, 10000),
duration_ref=1)
phase.add_state('x', fix_initial=True, fix_final=False, lower=-1.2,

upper=0.5, ref=1, defect_ref=1)
phase.add_state('v', fix_initial=True, fix_final=False, lower=-0.07,
upper=0.07, ref=1, defect_ref=1)
phase.add_control('u', lower=-U_MAX, upper=U_MAX, ref=1, continuity=True,
rate_continuity=False)
Next we define the optimal control problem by specifying the objective, boundary
constraints, and path constraints.
Why do we have a path constraint on the control u when we’ve already

specified its bounds?
Excellent question! In the Radau transcription, the nth order control polynomial is
governed by design variables provided at n points in the segment that do not
contain the right-most endpoint. Instead, this value is interpolated based on the
values of the first (n − 1). Since this value is not a design variable, it is necessary to
constrain its value separately. We could forgo specifying any bounds on u since it’s
completely covered by the path constraint, but specifying the bounds on the design
variable values can sometimes help by telling the optimizer, “Don’t even bother trying
values outside of this range.”.
Note that sometimes the opposite is true, and giving the optimizer the freedom to
explore a larger design space, only to eventually be “reined-in” by the path constraint
can sometimes be valuable.
The purpose of this interactive documentation is to let the user experiment. If you
remove the path constraint, you might notice some outlying control values in the
solution below.
#
# Minimize time at the end of the phase
#
phase.add_objective('time', loc='final', ref=1000)
phase.add_boundary_constraint('x', loc='final', lower=0.5)

phase.add_boundary_constraint('v', loc='final', lower=0.0)
phase.add_path_constraint('u', lower=-U_MAX, upper=U_MAX)
#
# Setup the Problem
#
p.setup()
--- Constraint Report [traj] ---

--- phase0 ---
[final] 5.0000e-01 <= x [m]
[final] 0.0000e+00 <= v [m/s]
[path] -1.0000e+00 <= u <= 1.0000e+00 [unitless]
<openmdao.core.problem.Problem at 0x7fa5a068edd0>
We then set the initial guesses for the variables in the problem and solve it.
Since fix_initial=True is set for time and the states, those values are not design
variables and will remain at the values given below throughout the solution process.
We’re using the phase interp method to provide initial guesses for the states and
controls. In this case, by giving it two values, it is linearly interpolating from the first
value to the second value, and then returning the interpolated value at the input
nodes for the given variable.
Finally, we use the dymos.run_problem method to execute the problem. This interface
allows us to do some things that the standard OpenMDAO problem.run_driver
interface does not. It will automatically record the final solution achieved by the
optimizer in case named 'final' in a file called dymos_solution.db . By specifying
simulate=True , it will automatically follow the solution with an explicit integration
using scipy.solve_ivp . The results of the simulation are stored in a case named
final in the file dymos_simulation.db . This explicit simulation demonstrates how the
system evolved with the given controls, and serves as a check that we’re using a dense
enough grid (enough segments and segments of sufficient order) to accurately
represent the solution.
If those two solution didn’t agree reasonably well, we could rerun the problem with a
more dense grid. Instead, we’re asking Dymos to automatically change the grid if
necessary by specifying refine_method='ph' . This will attempt to repeatedly solve the
problem and change the number of segments and segment orders until the solution
is in reasonable agreement.
3 von 4 14.04.2024, 12:16

#
# Set the initial values
#
p['traj.phase0.t_initial'] = 0.0
p['traj.phase0.t_duration'] = 500.0
p.set_val('traj.phase0.states:x', phase.interp('x', ys=[-0.5, 0.5]))

p.set_val('traj.phase0.states:v', phase.interp('v', ys=[0, 0.07]))
p.set_val('traj.phase0.controls:u', np.sin(phase.interp('u', ys=[0,
1.0])))
#
# Solve for the optimal trajectory
#
dm.run_problem(p, run_driver=True, simulate=True, refine_method='ph',
refine_iteration_limit=5)
▸ Show code cell output
Plotting the solution

The recommended practice is to obtain values from the recorded cases. While the
problem object can also be queried for values, building plotting scripts that use the
case recorder files as the data source means that the problem doesn’t need to be
solved just to change a plot. Here we load values of various variables from the
solution and simulation for use in the animation to follow.
sol = om.CaseReader('dymos_solution.db').get_case('final')
sim = om.CaseReader('dymos_simulation.db').get_case('final')
t = sol.get_val('traj.phase0.timeseries.time')
x = sol.get_val('traj.phase0.timeseries.x')
v = sol.get_val('traj.phase0.timeseries.v')
u = sol.get_val('traj.phase0.timeseries.u')
h = np.sin(3 * x) / 3
t_sim = sim.get_val('traj.phase0.timeseries.time')
x_sim = sim.get_val('traj.phase0.timeseries.x')
v_sim = sim.get_val('traj.phase0.timeseries.v')
u_sim = sim.get_val('traj.phase0.timeseries.u')
h_sim = np.sin(3 * x_sim) / 3
Animating the Solution

The collapsed code cell below contains the code used to produce an animation of the
mountain car solution using Matplotlib.
The green area represents the hilly terrain the car is traversing. The black circle is the
center of the car, and the orange arrow is the applied control.
The applied control generally has the same sign as the velocity and is ‘bang-bang’,
that is, it wants to be at its maximum possible magnitude. Interestingly, the sign of the
control flips shortly before the sign of the velocity changes.
▸ Show code cell source
        
Once Loop Reflect
References
[MMB14] Alexey A Melnikov, Adi Makmal, and Hans J Briegel. Projective simulation
applied to the grid-world and the mountain-car problem. arXiv preprint
arXiv:1405.5459, 2014.
[Moo90] Andrew William Moore. Efficient memory-based learning for robot control.
Technical Report UCAM-CL-TR-209, University of Cambridge, Computer
Laboratory, November 1990. URL: https://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-209.pdf, doi:10.48456/tr-209.
4 von 4 14.04.2024, 12:16

The Mountain Car Problem - Dymos

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Mountain Car Problem - Dymos

Uploaded by

Copyright:

Available Formats

The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.

By The Dymos Development Team

The Mountain Car Problem Last updated on None.

State and control variables

The dynamics of the system are governed by

Subject to the initial conditions

the control constraints

and the terminal constraints

Defining the ODE

A few things to note:

1. By providing the tag dymos.state_rate_source:{name} , we’re letting Dymos

1 von 4 14.04.2024, 12:16

self.add_input('x', shape=(nn,), units='m')

self.add_output('x_dot', shape=(nn,), units='m/s',

self.declare_partials(of='x_dot', wrt='v', rows=ar, cols=ar,

def compute(self, inputs, outputs):

def compute_partials(self, inputs, partials):

Solving the minimum-time mountain car

To begin, import the packages we require:

# The maximum absolute value of the acceleration authority of the car

# The number of segments into which the problem is discretized

For IPOPT, setting option nlp_scaling_method to 'gradient-based' can substantially

In addition to time , we have two states ( x and v ) and a single control ( u ).

2 von 4 14.04.2024, 12:16

phase.add_state('x', fix_initial=True, fix_final=False, lower=-1.2,

Why do we have a path constraint on the control u when we’ve already

phase.add_boundary_constraint('x', loc='final', lower=0.5)

--- Constraint Report [traj] ---

3 von 4 14.04.2024, 12:16

p.set_val('traj.phase0.states:x', phase.interp('x', ys=[-0.5, 0.5]))

▸ Show code cell output

Plotting the solution

Animating the Solution

▸ Show code cell source

Once Loop Reflect

4 von 4 14.04.2024, 12:16

You might also like