Professional Documents
Culture Documents
超参数字典Hyperparameter Dictionary
超参数字典Hyperparameter Dictionary
本文档版权归北京绿洲星辰教育科技有限公司所有
未经原作者许可不得转载使用本内容。
版权归北京绿洲星辰教育科技有限公司所有
Hyperparameter Dictionary
超参数字典
影响训练过程的变量称为训练的超参数,超参数通常是在学习过程开始之前定义好的,
而非在训练过程中获得的。超参数在机器学习实践中很重要,常常用来定义关于模型
的更高层次的概念,如复杂性或学习能力等等。然而,超参数并没有一个统一的“调
优”标准,只能通过不断的实践和摸索来推导适合不同模型的超参数。
1. Data point
A data point, also known as an experience, it is a tuple of (s,a,r,s’), where s
stands for an observation (or state) captured by the camera, a for an action
taken by the vehicle, r for the expected reward incurred by the said action,
and s’ for the new observation after the action is taken.
2. Episode
An episode is a period in which the vehicle starts from a given starting
point and ends up completing the track or going off the track. It embodies
a sequence of experiences. Different episodes can have different lengths.
Episode 是指车辆从一个给定的起点出发,最终完成轨道或偏离轨道的整个阶
段。Episode 包含了一系列的 data point 或 experience,根据小车形式状态不
同, episode 的长度也有所不同。
3. Experience buffer
An experience buffer consists of a number of ordered data points collected
over fixed number of episodes of varying lengths during training. For
Amazon DeepRacer, it corresponds to images captured by the camera
mounted on your Amazon DeepRacer vehicle and actions taken by the
vehicle and serves as the source from which input is drawn for updating
the underlying (policy and value) neural networks.
4. Batch
A batch is an ordered list of experiences, representing a portion of
simulation over a period of time, used to update the policy network
weights. It is a subset of the experience buffer.
5. Training Data
A training data is a set of batches sampled at random from an experience
buffer and used for training the policy network weights.
Required/是否必须:Yes
Valid values/有效值:Positive integer of (32, 64, 128, 256, 512)
Default value/默认值:64
2. Number of epochs
The number of passes through the training data to update the neural
network weights during gradient descent. The training data corresponds to
random samples from the experience buffer. Use a larger number of
epochs to promote more stable updates, but expect a slower training.
When the batch size is small, you can use a smaller number of epochs
一 个 epoch 是指在梯度下降时完整跑完一遍所有训练数据的过程,训练数据是从
experience buffer 中随机抽样得到的。当 epochs 值增大时,神经网络权重迭代
会更加平缓稳定,但有可能需要更长时间训练模型。一般来说,当 batch size 数
值较小时,我们也可以采用较小的 epochs 值。
Required/是否必须:No
Valid values/有效值:Positive integer between [3-10]
Default value/默认值:3
3. Leaning rate
During each update, a portion of the new weight can be from the gradient-
descent (or ascent) contribution and the rest from the existing weight
value. The learning rate controls how much a gradient-descent (or ascent)
update contributes to the network weights. Use a higher learning rate to
include more gradient-descent contributions for faster training, but be
aware of the possibility that the expected reward may not converge if the
learning rate is too large.
Required/是否必须:No
Valid values/有效值:Real number between 0.00000001 (or 10-8) and 0.001
(or 10-3)
Default value/默认值:0.0003
4. Entropy
A degree of uncertainty used to determine when to add randomness to the
policy distribution. The added uncertainty helps the Amazon DeepRacer
vehicle explore the action space more broadly. A larger entropy value
encourages the vehicle to explore the action space more thoroughly.
Entropy/熵代表对不确定性的测量,用来确定何时向策略分布中增加不确定性。
当不确定性增加时,Amazon DeepRacer 将会更广泛地在赛道上进行探索。
Required/是否必须:No
Valid values/有效值:Real number between 0 and 1.
Default value/默认值:0.01
5. Discount factor
A factor specifies how much of the future rewards contribute to the
expected reward. The larger the Discount factor value is, the farther out
contributions the vehicle considers to make a move and the slower the
training. With the discount factor of 0.9, the vehicle includes rewards from
an order of 10 future steps to make a move. With the discount factor of
0.999, the vehicle considers rewards from an order of 1000 future steps to
make a move. The recommended discount factor values are 0.99, 0.999
and 0.9999.
为了获得长期的良好表现,我们不仅要考虑眼下的即时的奖励,还要考虑未来的收
益。Discount factor 可以理解为贴现因子,是未来奖励折合到当下预期奖励的折
扣率。Discount factor 取值越大,表示我们将越长远的未来收益考虑到了当前行
为产生的价值中,如果取值过大,则考虑得过于长远,甚至早已超出当前行为所能
影响的范围,这显然是不合理的,也会很大程度地降低训练速度。例如,当
discount factor 被设置为 0.9 时,小车在决定当下 action/动作时需要考虑未来
10 步带来的奖励;当 discount factor 被设置为 0.999 时,小车在决定当下
action/动作时需要考虑未来 1000 步带来的奖励。一般来说 discount factor 推荐
设置为 0.99, 0.999 和 0.9999。
Required/是否必须:No
Valid values/有效值:Real number between 0 and 1.
Default value/默认值:0.999
6. Loss type
Type of the objective function used to update the network weights. A good
training algorithm should make incremental changes to the agent's
strategy so that it gradually transitions from taking random actions to
taking strategic actions to increase reward. But if it makes too big a change
then the training becomes unstable and the agent ends up not learning.
The Huber loss and Mean squared error loss types behave similarly for
small updates. But as the updates become larger, Huber loss takes smaller
increments compared to Mean squared error loss. When you have
convergence problems, use the Huber loss type. When convergence is
good and you want to train faster, use the Mean squared error loss type.
Required/是否必须:No
Valid values/有效值:(Huber loss, Mean squared error loss)
Default value/默认值:Huber loss
补充素材:
均方差(Mean Squared Error,MSE)损失基本公式::
Required/是否必须:No
Valid values/有效值:Integer between 5 and 100
Default value/默认值:20