Professional Documents
Culture Documents
-MAYUR BHISE
PPO
• Advantage Function
• Gradient Ascent (Trust Regions)
• Actor-Critic
• Clipped Objective Function
ALGORITHM
Reinforce
• Importance Sampling
• Unstable update • Variance is Different (Sample more Data)
• KL Divergence
• Large Step size – Bad Policy
• Bad Samples – Worse Policy
• Data Inefficiency
• On – Policy Method
• Data Thrown away
• Training Slow
CAR-RACING ENVIRONMENT