How To Design A Reinforcement Learning Reward Function For A Lunar Lander ? by Alina Zhang Towards Data Science

How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...
Open in app
Search
Member-only story
How to Design a Reinforcement Learning

Reward Function for a Lunar Lander �
Alina Zhang · Follow
Published in Towards Data Science
3 min read · Aug 18, 2021
Listen Share More
Photo credit to Nasa; Code by Author
Imagine aliens � attacked and you were trying to land a Lander�

� on the Moon,
what factors would you consider to complete the mission successfully?
1 of 11 11/28/2023, 9:58 AM
Here are some considerations:
• Touch down on the landing pad vs Move away from the landing pad
• Land with a low velocity vs Crash at a high velocity
• Use as little fuel as possible vs Use lots of fuel
• Approach the target as fast as possible vs Hang in the air
What to punish? What to reward? How to balance multiple constraints? And how to
represent those ideas in our reward function?
Reward Function in Reinforcement Learning

Reinforcement Learning (RL) is a branch in machine learning that leverages the
trial and error problem-solving method in agent training. In our example, the agent
will try to land the Lunar Lander for, let’s say, 10k times, to learn how to make better
actions in different states.
The Reward Function is an incentive mechanism that tells the agent what is correct
and what is wrong using reward and punishment. The goal of agents in RL is to
maximize the total rewards. Sometimes we need to sacrifice immediate rewards in
order to maximize the total rewards.
The rules in reward function of lunar lander

Some ideas of reward and punishment rules in lunar lander reward function could
be:
• Give a high reward for landing on the right place with low enough velocity
• Give a penalty if lander landed outside of the landing pad
• Give a reward based on the percentage of remaining fuel
• Give a big penalty if the velocity is above threshold (crashed) when landed on
the surface
• Give distance reward to encourage lander to approach the target
2 of 11 11/28/2023, 9:58 AM
How to represent the rules in python code
Follow
Written by Alina Zhang

1.2K Followers · Writer for Towards Data Science
Data Scientist: Keep it simple. https://lnkd.in/gjwc233a
More from Alina Zhang and Towards Data Science
3 of 11 11/28/2023, 9:58 AM
Alina Zhang in Towards Data Science
Fuzzy String Match With Python on Large Dataset and Why You Should
Not Use FuzzyWuzzy
FuzzyCouple: A Solution for Fuzzy Match Using TF-IDF and Cosine Similarity
· 5 min read · Dec 20, 2020
-- 4
Rahul Nayak in Towards Data Science
How to Convert Any Text Into a Graph of Concepts

A method to convert any text corpus into a Knowledge Graph using Mistral 7B.
12 min read · Nov 10
-- 37
4 of 11 11/28/2023, 9:58 AM
Marco Peixeiro in Towards Data Science
TimeGPT: The First Foundation Model for Time Series Forecasting

Explore the first generative pre-trained forecasting model and apply it in a project with Python
· 12 min read · Oct 24
-- 22
5 of 11 11/28/2023, 9:58 AM
Alina Zhang in Towards Data Science
Visualize Geographic Data Using Longitude and Latitude Values in

Tableau
· 3 min read · Sep 5, 2018
-- 1
See all from Alina Zhang
Recommended from Medium

See all from Towards Data Science
VJ Anand
Reinforcement Learning in the context of LLM

Introduction
9 min read · Jul 6
--
6 of 11 11/28/2023, 9:58 AM
Mehul Gupta in Data Science in your pocket
How to create a custom OpenAI Gym environment? with codes

Creating a game environment in OpenAI-gym from scratch
5 min read · Jul 11
-- 1
Lists
Predictive Modeling w/ Python

20 stories · 642 saves
Practical Guides to Machine Learning

Natural Language Processing

ChatGPT prompts
7 of 11 11/28/2023, 9:58 AM
William Seymour
Training an AI to play a game using Deep Reinforcement Learning

This article builds on tutorials on Reinforcement Learning (DQN, or Deep Q Network), such as
this one. I recommend checking that out for…
12 min read · Sep 27
--
8 of 11 11/28/2023, 9:58 AM
João Lages in Towards AI
Reinforcement Learning from Human Feedback (RLHF)

A Simplified Explanation
5 min read · Nov 1
-- 1
9 of 11 11/28/2023, 9:58 AM
Wouter van Heeswijk, PhD in Towards Data Science
Proximal Policy Optimization (PPO) Explained

The journey from REINFORCE to the go-to algorithm in continuous control
· 13 min read · Nov 30, 2022
-- 5
Bechir Trabelsi
On-policy vs Off-policy Monte Carlo Control Methods for Supply Chain

Optimization: A Use Case of…
Introduction
19 min read · Oct 22
--
See more recommendations
10 of 11 11/28/2023, 9:58 AM
11 of 11 11/28/2023, 9:58 AM

How To Design A Reinforcement Learning Reward Function For A Lunar Lander ? by Alina Zhang Towards Data Science

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Design A Reinforcement Learning Reward Function For A Lunar Lander ? by Alina Zhang Towards Data Science

Uploaded by

Copyright:

Available Formats

How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

How to Design a Reinforcement Learning

Listen Share More

Photo credit to Nasa; Code by Author

Imagine aliens � attacked and you were trying to land a Lander�

Here are some considerations:

• Land with a low velocity vs Crash at a high velocity

• Use as little fuel as possible vs Use lots of fuel

• Approach the target as fast as possible vs Hang in the air

Reward Function in Reinforcement Learning

The rules in reward function of lunar lander

• Give a penalty if lander landed outside of the landing pad

• Give a reward based on the percentage of remaining fuel

• Give distance reward to encourage lander to approach the target

How to represent the rules in python code

Written by Alina Zhang

Data Scientist: Keep it simple. https://lnkd.in/gjwc233a

More from Alina Zhang and Towards Data Science

Alina Zhang in Towards Data Science

· 5 min read · Dec 20, 2020

Rahul Nayak in Towards Data Science

How to Convert Any Text Into a Graph of Concepts

12 min read · Nov 10

Marco Peixeiro in Towards Data Science

TimeGPT: The First Foundation Model for Time Series Forecasting

· 12 min read · Oct 24

Alina Zhang in Towards Data Science

Visualize Geographic Data Using Longitude and Latitude Values in

See all from Alina Zhang

Recommended from Medium

Reinforcement Learning in the context of LLM

9 min read · Jul 6

Mehul Gupta in Data Science in your pocket

How to create a custom OpenAI Gym environment? with codes

5 min read · Jul 11

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Natural Language Processing

Training an AI to play a game using Deep Reinforcement Learning

12 min read · Sep 27

João Lages in Towards AI

Reinforcement Learning from Human Feedback (RLHF)

5 min read · Nov 1

Wouter van Heeswijk, PhD in Towards Data Science

Proximal Policy Optimization (PPO) Explained

· 13 min read · Nov 30, 2022

On-policy vs Off-policy Monte Carlo Control Methods for Supply Chain

19 min read · Oct 22

See more recommendations

You might also like