You are on page 1of 4

The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.

html

By The Dymos Development Team


© Copyright 2022.

The Mountain Car Problem Last updated on None.

Contents
• State and control variables
• Problem Definition
• Defining the ODE
• Solving the minimum-time mountain car problem with Dymos
• Plotting the solution
• Animating the Solution
• References

The mountain car problem proposes a vehicle stuck in a “well.” It lacks the power to
directly climb out of the well, but instead must accelerate repeatedly forwards and
backwards until it has achieved the energy necessary to exit the well.

The problem is a popular machine learning test case, though the methods in Dymos
are capable of solving it. It first appeared in the PhD thesis of Andrew Moore in 1990.
[Moo90]. The implementation here is based on that given by Melnikov, Makmal, and
Briegel [MMB14].

State and control variables


This system has two state variables, the position (x) and velocity (v) of the car.

This system has a single control variable (u), the effort put into moving. This control is
contrained to the range [−1 1].

The dynamics of the system are governed by

(65)
ẋ = v
v̇ = 0.001 ∗ u − 0.0025 ∗ cos(3x)

Problem Definition
We seek to minimize the time required to exit the well in the positive direction.

Minimize J = tf (66)

Subject to the initial conditions

(67)
x0 = −0.5
v0 = 0.0

the control constraints

|u| ≤ 1 (68)

and the terminal constraints

(69)
xf = 0.5
vf ≥ 0.0

Defining the ODE


The following code implements the equations of motion for the mountain car
problem.

A few things to note:

1. By providing the tag dymos.state_rate_source:{name} , we’re letting Dymos


know what states need to be integrated, there’s no need to specify a rate source
when using this ODE in our Phase.
2. Pairing the above tag with dymos.state_units:{units} means we don’t have to
specify units when setting properties for the state in our run script.
3. We only use compute_partials to override the values of ∂v̇
∂x because ∂v̇
∂u and ∂ẋ
∂v
are constant and their value is specified during setup .

1 von 4 14.04.2024, 12:16


The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.html

import numpy as np
import openmdao.api as om

class MountainCarODE(om.ExplicitComponent):

def initialize(self):
self.options.declare('num_nodes', types=int)

def setup(self):
nn = self.options['num_nodes']

self.add_input('x', shape=(nn,), units='m')


self.add_input('v', shape=(nn,), units='m/s')
self.add_input('u', shape=(nn,), units='unitless')

self.add_output('x_dot', shape=(nn,), units='m/s',


tags=['dymos.state_rate_source:x',
'dymos.state_units:m'])
self.add_output('v_dot', shape=(nn,), units='m/s**2',
tags=['dymos.state_rate_source:v',
'dymos.state_units:m/s'])

ar = np.arange(nn, dtype=int)

self.declare_partials(of='x_dot', wrt='v', rows=ar, cols=ar,


val=1.0)
self.declare_partials(of='v_dot', wrt='u', rows=ar, cols=ar,
val=0.001)
self.declare_partials(of='v_dot', wrt='x', rows=ar, cols=ar)

def compute(self, inputs, outputs):


x = inputs['x']
v = inputs['v']
u = inputs['u']
outputs['x_dot'] = v
outputs['v_dot'] = 0.001 * u - 0.0025 * np.cos(3*x)

def compute_partials(self, inputs, partials):


x = inputs['x']
partials['v_dot', 'x'] = 3 * 0.0025 * np.sin(3 * x)

Solving the minimum-time mountain car


problem with Dymos
The following script solves the minimum-time mountain car problem with Dymos.
Note that this example requires the IPOPT optimizer via the pyoptsparse package.
Scipy’s SLSQP optimizer is generally not capable of solving this problem.

To begin, import the packages we require:

import dymos as dm
import matplotlib.pyplot as plt
from matplotlib import animation

Next, we set two constants. U_MAX is the maximum allowable magnitude of the
acceleration. The references show this problem being solved with −1 ≤ u ≤ 1.

Variable NUM_SEG is the number of equally spaced polynomial segments into which
time is being divided. Within each of these segments, the time-history of each state
and control is being treated as a polynomial (we’re using the default order of 3).

# The maximum absolute value of the acceleration authority of the car


U_MAX = 1.0

# The number of segments into which the problem is discretized


NUM_SEG = 30

We then instantiate an OpenMDAO problem and set the optimizer and its options.

For IPOPT, setting option nlp_scaling_method to 'gradient-based' can substantially


improve the convergence of the optimizer without the need for us to set all of the
scaling manually.

The call to declare_coloring tells the optimizer to attempt to find a sparsity pattern
that minimizes the work required to compute the derivatives across the model.

#
# Initialize the Problem and the optimization driver
#
p = om.Problem()

p.driver = om.pyOptSparseDriver(optimizer='IPOPT')
p.driver.opt_settings['print_level'] = 0
p.driver.opt_settings['max_iter'] = 500
p.driver.opt_settings['mu_strategy'] = 'adaptive'
p.driver.opt_settings['bound_mult_init_method'] = 'mu-based'
p.driver.opt_settings['tol'] = 1.0E-8
p.driver.opt_settings['nlp_scaling_method'] = 'gradient-based' # for
faster convergence

p.driver.declare_coloring()

Next, we add a Dymos Trajectory group to the problem’s model and add a phase to it.

In this case we’re using the Radau pseudospectral transcription to solve the problem.

#
# Create a trajectory and add a phase to it
#
traj = p.model.add_subsystem('traj', dm.Trajectory())
tx = transcription=dm.Radau(num_segments=NUM_SEG)
phase = traj.add_phase('phase0', dm.Phase(ode_class=MountainCarODE,
transcription=tx))

At this point, we set the options on the main variables used in a Dymos phase.

In addition to time , we have two states ( x and v ) and a single control ( u ).

There are no parameters and no polynomial controls. We could have tried to use a
polynomial control here, but as we will see the solution contains large discontinuities
in the control value, which make it ill-suited for a polynomial control. Polynomial
controls are modeled as a single (typically low-order) polynomial across the entire
phase.

2 von 4 14.04.2024, 12:16


The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.html

We’re fixing the initial time and states to whatever values we provide before executing
the problem. We will constrain the final values with nonlinear constraints in the next
step.

The scaler values ( ref ) are all set to 1 here. We’re using IPOPT’s gradient-based
scaling option and will let it work the scaling out for us.

Bounds on time duration are guesses, and the bounds on the states and controls
come from the implementation in the references.

Also, we don’t need to specify targets for any of the variables here because their
names are the targets in the top-level of the model. The rate source and units for the
states are obtained from the tags in the ODE component we previously defined.

#
# Set the variables
#
phase.set_time_options(fix_initial=True, duration_bounds=(.05, 10000),
duration_ref=1)

phase.add_state('x', fix_initial=True, fix_final=False, lower=-1.2,


upper=0.5, ref=1, defect_ref=1)
phase.add_state('v', fix_initial=True, fix_final=False, lower=-0.07,
upper=0.07, ref=1, defect_ref=1)
phase.add_control('u', lower=-U_MAX, upper=U_MAX, ref=1, continuity=True,
rate_continuity=False)

Next we define the optimal control problem by specifying the objective, boundary
constraints, and path constraints.

Why do we have a path constraint on the control u when we’ve already


specified its bounds?

Excellent question! In the Radau transcription, the nth order control polynomial is
governed by design variables provided at n points in the segment that do not
contain the right-most endpoint. Instead, this value is interpolated based on the
values of the first (n − 1). Since this value is not a design variable, it is necessary to
constrain its value separately. We could forgo specifying any bounds on u since it’s
completely covered by the path constraint, but specifying the bounds on the design
variable values can sometimes help by telling the optimizer, “Don’t even bother trying
values outside of this range.”.

Note that sometimes the opposite is true, and giving the optimizer the freedom to
explore a larger design space, only to eventually be “reined-in” by the path constraint
can sometimes be valuable.

The purpose of this interactive documentation is to let the user experiment. If you
remove the path constraint, you might notice some outlying control values in the
solution below.

#
# Minimize time at the end of the phase
#
phase.add_objective('time', loc='final', ref=1000)

phase.add_boundary_constraint('x', loc='final', lower=0.5)


phase.add_boundary_constraint('v', loc='final', lower=0.0)
phase.add_path_constraint('u', lower=-U_MAX, upper=U_MAX)

#
# Setup the Problem
#
p.setup()

--- Constraint Report [traj] ---


--- phase0 ---
[final] 5.0000e-01 <= x [m]
[final] 0.0000e+00 <= v [m/s]
[path] -1.0000e+00 <= u <= 1.0000e+00 [unitless]

<openmdao.core.problem.Problem at 0x7fa5a068edd0>

We then set the initial guesses for the variables in the problem and solve it.

Since fix_initial=True is set for time and the states, those values are not design
variables and will remain at the values given below throughout the solution process.

We’re using the phase interp method to provide initial guesses for the states and
controls. In this case, by giving it two values, it is linearly interpolating from the first
value to the second value, and then returning the interpolated value at the input
nodes for the given variable.

Finally, we use the dymos.run_problem method to execute the problem. This interface
allows us to do some things that the standard OpenMDAO problem.run_driver
interface does not. It will automatically record the final solution achieved by the
optimizer in case named 'final' in a file called dymos_solution.db . By specifying
simulate=True , it will automatically follow the solution with an explicit integration
using scipy.solve_ivp . The results of the simulation are stored in a case named
final in the file dymos_simulation.db . This explicit simulation demonstrates how the
system evolved with the given controls, and serves as a check that we’re using a dense
enough grid (enough segments and segments of sufficient order) to accurately
represent the solution.

If those two solution didn’t agree reasonably well, we could rerun the problem with a
more dense grid. Instead, we’re asking Dymos to automatically change the grid if
necessary by specifying refine_method='ph' . This will attempt to repeatedly solve the
problem and change the number of segments and segment orders until the solution
is in reasonable agreement.

3 von 4 14.04.2024, 12:16


The Mountain Car Problem — Dymos https://openmdao.org/dymos/docs/latest/examples/mountain_car/mountain_car.html

#
# Set the initial values
#
p['traj.phase0.t_initial'] = 0.0
p['traj.phase0.t_duration'] = 500.0

p.set_val('traj.phase0.states:x', phase.interp('x', ys=[-0.5, 0.5]))


p.set_val('traj.phase0.states:v', phase.interp('v', ys=[0, 0.07]))
p.set_val('traj.phase0.controls:u', np.sin(phase.interp('u', ys=[0,
1.0])))

#
# Solve for the optimal trajectory
#
dm.run_problem(p, run_driver=True, simulate=True, refine_method='ph',
refine_iteration_limit=5)

▸ Show code cell output

Plotting the solution


The recommended practice is to obtain values from the recorded cases. While the
problem object can also be queried for values, building plotting scripts that use the
case recorder files as the data source means that the problem doesn’t need to be
solved just to change a plot. Here we load values of various variables from the
solution and simulation for use in the animation to follow.

sol = om.CaseReader('dymos_solution.db').get_case('final')
sim = om.CaseReader('dymos_simulation.db').get_case('final')

t = sol.get_val('traj.phase0.timeseries.time')
x = sol.get_val('traj.phase0.timeseries.x')
v = sol.get_val('traj.phase0.timeseries.v')
u = sol.get_val('traj.phase0.timeseries.u')
h = np.sin(3 * x) / 3

t_sim = sim.get_val('traj.phase0.timeseries.time')
x_sim = sim.get_val('traj.phase0.timeseries.x')
v_sim = sim.get_val('traj.phase0.timeseries.v')
u_sim = sim.get_val('traj.phase0.timeseries.u')
h_sim = np.sin(3 * x_sim) / 3

Animating the Solution


The collapsed code cell below contains the code used to produce an animation of the
mountain car solution using Matplotlib.

The green area represents the hilly terrain the car is traversing. The black circle is the
center of the car, and the orange arrow is the applied control.

The applied control generally has the same sign as the velocity and is ‘bang-bang’,
that is, it wants to be at its maximum possible magnitude. Interestingly, the sign of the
control flips shortly before the sign of the velocity changes.

▸ Show code cell source

        

Once Loop Reflect

References
[MMB14] Alexey A Melnikov, Adi Makmal, and Hans J Briegel. Projective simulation
applied to the grid-world and the mountain-car problem. arXiv preprint
arXiv:1405.5459, 2014.
[Moo90] Andrew William Moore. Efficient memory-based learning for robot control.
Technical Report UCAM-CL-TR-209, University of Cambridge, Computer
Laboratory, November 1990. URL: https://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-209.pdf, doi:10.48456/tr-209.

4 von 4 14.04.2024, 12:16

You might also like