You are on page 1of 8

版权声明

本文档版权归北京绿洲星辰教育科技有限公司所有
未经原作者许可不得转载使用本内容。

版权归北京绿洲星辰教育科技有限公司所有
Hyperparameter Dictionary
超参数字典

影响训练过程的变量称为训练的超参数,超参数通常是在学习过程开始之前定义好的,
而非在训练过程中获得的。超参数在机器学习实践中很重要,常常用来定义关于模型
的更高层次的概念,如复杂性或学习能力等等。然而,超参数并没有一个统一的“调
优”标准,只能通过不断的实践和摸索来推导适合不同模型的超参数。

在 Amazon DeepRacer 训练中,我们需要明确的几个定义:

1. Data point
A data point, also known as an experience, it is a tuple of (s,a,r,s’), where s
stands for an observation (or state) captured by the camera, a for an action
taken by the vehicle, r for the expected reward incurred by the said action,
and s’ for the new observation after the action is taken.

Data point 也可以被称为 experience,它是(s, a, r, s’)四个元素组成的元组


/tuple,其中 s 代表由小车摄像头捕捉到的观察/observation(或状态/state),a
表示小车所采取的动作/action,r 表示该动作可预估获得的奖励值/reward,而 s’
为小车采取该动作后的新的观察/observation。

2. Episode
An episode is a period in which the vehicle starts from a given starting
point and ends up completing the track or going off the track. It embodies
a sequence of experiences. Different episodes can have different lengths.

Episode 是指车辆从一个给定的起点出发,最终完成轨道或偏离轨道的整个阶
段。Episode 包含了一系列的 data point 或 experience,根据小车形式状态不
同, episode 的长度也有所不同。
3. Experience buffer
An experience buffer consists of a number of ordered data points collected
over fixed number of episodes of varying lengths during training. For
Amazon DeepRacer, it corresponds to images captured by the camera
mounted on your Amazon DeepRacer vehicle and actions taken by the
vehicle and serves as the source from which input is drawn for updating
the underlying (policy and value) neural networks.

Experience buffer 指的是从固定数量、不同长度的 episode 中收集到的有序


data points/experiences。对于 Amazon DeepRacer 而言,它指的是从小车摄
像机上捕捉到的图像以及小车采取的行动,作为输入用来迭代神经网络策略。

4. Batch
A batch is an ordered list of experiences, representing a portion of
simulation over a period of time, used to update the policy network
weights. It is a subset of the experience buffer.

Batch 是 data points/experiences 的有序列表,代表一段时间内模拟训练的一部


分,用来迭代策略网络的权重。Batch 是 experience buffer 的子集。

5. Training Data
A training data is a set of batches sampled at random from an experience
buffer and used for training the policy network weights.

Training data 是从 experience buffer 中随机采样的一组 batches,用来训练


策略网络的权重。
在 Amazon DeepRacer 模型中用到的超参数:
1. Gradient descent batch size
The number recent vehicle experiences sampled at random from an
experience buffer and used for updating the underlying deep-learning
neural network weights. Random sampling helps reduce correlations
inherent in the input data. Use a larger batch size to promote more stable
and smooth updates to the neural network weights, but be aware of the
possibility that the training may be longer or slower.

从 experience buffer 中随机采样 experiences 用来迭代深度学习神经网络的权重


的样本量,即一次迭代所用的样本量。随机采样有助于减少输入数据中固有的相关
性。当 Gradient descent batch size 增大时,神经网络权重的迭代会更加平缓稳
定,但有可能需要更长时间训练模型。

Required/是否必须:Yes
Valid values/有效值:Positive integer of (32, 64, 128, 256, 512)
Default value/默认值:64

2. Number of epochs
The number of passes through the training data to update the neural
network weights during gradient descent. The training data corresponds to
random samples from the experience buffer. Use a larger number of
epochs to promote more stable updates, but expect a slower training.
When the batch size is small, you can use a smaller number of epochs

一 个 epoch 是指在梯度下降时完整跑完一遍所有训练数据的过程,训练数据是从
experience buffer 中随机抽样得到的。当 epochs 值增大时,神经网络权重迭代
会更加平缓稳定,但有可能需要更长时间训练模型。一般来说,当 batch size 数
值较小时,我们也可以采用较小的 epochs 值。

Required/是否必须:No
Valid values/有效值:Positive integer between [3-10]
Default value/默认值:3
3. Leaning rate
During each update, a portion of the new weight can be from the gradient-
descent (or ascent) contribution and the rest from the existing weight
value. The learning rate controls how much a gradient-descent (or ascent)
update contributes to the network weights. Use a higher learning rate to
include more gradient-descent contributions for faster training, but be
aware of the possibility that the expected reward may not converge if the
learning rate is too large.

Learning rate 控制每次迭代中梯度下降或上升对当前神经网络权重的影响,用来


控 制 模 型 的 学 习 速 度 。 当 learning rate 较 大 时 , 训 练 会 变 得 比 较 快 ; 但
learning rate 过大可能会造成难以收敛。

Required/是否必须:No
Valid values/有效值:Real number between 0.00000001 (or 10-8) and 0.001
(or 10-3)
Default value/默认值:0.0003

4. Entropy
A degree of uncertainty used to determine when to add randomness to the
policy distribution. The added uncertainty helps the Amazon DeepRacer
vehicle explore the action space more broadly. A larger entropy value
encourages the vehicle to explore the action space more thoroughly.

Entropy/熵代表对不确定性的测量,用来确定何时向策略分布中增加不确定性。
当不确定性增加时,Amazon DeepRacer 将会更广泛地在赛道上进行探索。

Required/是否必须:No
Valid values/有效值:Real number between 0 and 1.
Default value/默认值:0.01
5. Discount factor
A factor specifies how much of the future rewards contribute to the
expected reward. The larger the Discount factor value is, the farther out
contributions the vehicle considers to make a move and the slower the
training. With the discount factor of 0.9, the vehicle includes rewards from
an order of 10 future steps to make a move. With the discount factor of
0.999, the vehicle considers rewards from an order of 1000 future steps to
make a move. The recommended discount factor values are 0.99, 0.999
and 0.9999.

为了获得长期的良好表现,我们不仅要考虑眼下的即时的奖励,还要考虑未来的收
益。Discount factor 可以理解为贴现因子,是未来奖励折合到当下预期奖励的折
扣率。Discount factor 取值越大,表示我们将越长远的未来收益考虑到了当前行
为产生的价值中,如果取值过大,则考虑得过于长远,甚至早已超出当前行为所能
影响的范围,这显然是不合理的,也会很大程度地降低训练速度。例如,当
discount factor 被设置为 0.9 时,小车在决定当下 action/动作时需要考虑未来
10 步带来的奖励;当 discount factor 被设置为 0.999 时,小车在决定当下
action/动作时需要考虑未来 1000 步带来的奖励。一般来说 discount factor 推荐
设置为 0.99, 0.999 和 0.9999。

Required/是否必须:No
Valid values/有效值:Real number between 0 and 1.
Default value/默认值:0.999

6. Loss type
Type of the objective function used to update the network weights. A good
training algorithm should make incremental changes to the agent's
strategy so that it gradually transitions from taking random actions to
taking strategic actions to increase reward. But if it makes too big a change
then the training becomes unstable and the agent ends up not learning.
The Huber loss and Mean squared error loss types behave similarly for
small updates. But as the updates become larger, Huber loss takes smaller
increments compared to Mean squared error loss. When you have
convergence problems, use the Huber loss type. When convergence is
good and you want to train faster, use the Mean squared error loss type.

Amazon DeepRacer 可以选择 The Huber loss 和 MSE loss 两种损失函数。一个


好的训练算法应该逐步改变小车的策略,让小车从随意探索逐渐变为有策略性的行
动。当改变较小迭代较慢时, Huber loss 和 MSE loss 差别不大。当迭代变大时,
Huber loss 比 MSE 采取的增量更小,更稳定。当模型收敛遇到问题时,可以选择
使 用 Huber loss;当模型可以完美收敛,想要提高训练速度时,可以选择使用
MSE loss。

Required/是否必须:No
Valid values/有效值:(Huber loss, Mean squared error loss)
Default value/默认值:Huber loss

补充素材:
均方差(Mean Squared Error,MSE)损失基本公式::

平均绝对误差(Mean Absolute Error Loss,MAE)损失基本公式:

Huber Loss 是一种将 MSE 与 MAE 结合起来,取两者优点的损失函数,也被称


作 Smooth Mean Absolute Error Loss 。其原理很简单,就是在误差接近 0 时使
用 MSE,误差较大时使用 MAE,公式为:
7. Number of experience episodes between each policy updating iteration
The size of the experience buffer used to draw training data from for
learning policy network weights. An experience episode is a period in
which the agent starts from a given starting point and ends up completing
the track or going off the track. It consists of a sequence of experiences.
Different episodes can have different lengths. For simple reinforcement-
learning problems, a small experience buffer may be sufficient and learning
is fast. For more complex problems that have more local maxima, a larger
experience buffer is necessary to provide more uncorrelated data points. In
this case, training is slower but more stable. The recommended values are
10, 20 and 40.

Experience buffer 指的是从固定数量、不同长度的 episode 中收集到的有序


data points/experiences。在这里我们可以手动设定 experience buffer 的大小,
由多少个 experience episode 组成。Experience episode 指的是车辆从一个给
定的起点出发,最终完成轨道或偏离轨道的整个阶段,由一系列 experiences 组
成。当我们的奖励函数比较简单时,可以设置一个较小的 experience buffer;而
当问题越来越复杂时,我们就需要设置更大的 experience buffer,给训练提供足
够多的数据点。Experience buffer 设置得越大,训练越慢,但越稳定。一般来讲推
荐设置为 10、20 或 40。

Required/是否必须:No
Valid values/有效值:Integer between 5 and 100
Default value/默认值:20

You might also like