You are on page 1of 4

PPO – ON CAR-RACING ENV

IMAGE-BASED PATH PLANNING METHOD

-MAYUR BHISE
PPO

• Advantage Function
• Gradient Ascent (Trust Regions)
• Actor-Critic
• Clipped Objective Function
ALGORITHM

Reinforce
• Importance Sampling
• Unstable update • Variance is Different (Sample more Data)
• KL Divergence
• Large Step size – Bad Policy
• Bad Samples – Worse Policy

• Data Inefficiency
• On – Policy Method
• Data Thrown away
• Training Slow
CAR-RACING ENVIRONMENT

You might also like