You are on page 1of 11

How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Open in app

Search

Member-only story

How to Design a Reinforcement Learning


Reward Function for a Lunar Lander �
Alina Zhang · Follow
Published in Towards Data Science
3 min read · Aug 18, 2021

Listen Share More

Photo credit to Nasa; Code by Author

Imagine aliens � attacked and you were trying to land a Lander�


� on the Moon,
what factors would you consider to complete the mission successfully?

1 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Here are some considerations:

• Touch down on the landing pad vs Move away from the landing pad

• Land with a low velocity vs Crash at a high velocity

• Use as little fuel as possible vs Use lots of fuel

• Approach the target as fast as possible vs Hang in the air

What to punish? What to reward? How to balance multiple constraints? And how to
represent those ideas in our reward function?

Reward Function in Reinforcement Learning


Reinforcement Learning (RL) is a branch in machine learning that leverages the
trial and error problem-solving method in agent training. In our example, the agent
will try to land the Lunar Lander for, let’s say, 10k times, to learn how to make better
actions in different states.

The Reward Function is an incentive mechanism that tells the agent what is correct
and what is wrong using reward and punishment. The goal of agents in RL is to
maximize the total rewards. Sometimes we need to sacrifice immediate rewards in
order to maximize the total rewards.

The rules in reward function of lunar lander


Some ideas of reward and punishment rules in lunar lander reward function could
be:

• Give a high reward for landing on the right place with low enough velocity

• Give a penalty if lander landed outside of the landing pad

• Give a reward based on the percentage of remaining fuel

• Give a big penalty if the velocity is above threshold (crashed) when landed on
the surface

• Give distance reward to encourage lander to approach the target

2 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

How to represent the rules in python code

Follow

Written by Alina Zhang


1.2K Followers · Writer for Towards Data Science

Data Scientist: Keep it simple. https://lnkd.in/gjwc233a

More from Alina Zhang and Towards Data Science

3 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Alina Zhang in Towards Data Science

Fuzzy String Match With Python on Large Dataset and Why You Should
Not Use FuzzyWuzzy
FuzzyCouple: A Solution for Fuzzy Match Using TF-IDF and Cosine Similarity

· 5 min read · Dec 20, 2020

-- 4

Rahul Nayak in Towards Data Science

How to Convert Any Text Into a Graph of Concepts


A method to convert any text corpus into a Knowledge Graph using Mistral 7B.

12 min read · Nov 10

-- 37

4 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Marco Peixeiro in Towards Data Science

TimeGPT: The First Foundation Model for Time Series Forecasting


Explore the first generative pre-trained forecasting model and apply it in a project with Python

· 12 min read · Oct 24

-- 22

5 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Alina Zhang in Towards Data Science

Visualize Geographic Data Using Longitude and Latitude Values in


Tableau
· 3 min read · Sep 5, 2018

-- 1

See all from Alina Zhang

Recommended from Medium


See all from Towards Data Science

VJ Anand

Reinforcement Learning in the context of LLM


Introduction

9 min read · Jul 6

--

6 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Mehul Gupta in Data Science in your pocket

How to create a custom OpenAI Gym environment? with codes


Creating a game environment in OpenAI-gym from scratch

5 min read · Jul 11

-- 1

Lists

Predictive Modeling w/ Python


20 stories · 642 saves

Practical Guides to Machine Learning


10 stories · 723 saves

Natural Language Processing


903 stories · 426 saves

ChatGPT prompts
30 stories · 720 saves

7 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

William Seymour

Training an AI to play a game using Deep Reinforcement Learning


This article builds on tutorials on Reinforcement Learning (DQN, or Deep Q Network), such as
this one. I recommend checking that out for…

12 min read · Sep 27

--

8 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

João Lages in Towards AI

Reinforcement Learning from Human Feedback (RLHF)


A Simplified Explanation

5 min read · Nov 1

-- 1

9 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

Wouter van Heeswijk, PhD in Towards Data Science

Proximal Policy Optimization (PPO) Explained


The journey from REINFORCE to the go-to algorithm in continuous control

· 13 min read · Nov 30, 2022

-- 5

Bechir Trabelsi

On-policy vs Off-policy Monte Carlo Control Methods for Supply Chain


Optimization: A Use Case of…
Introduction

19 min read · Oct 22

--

See more recommendations

10 of 11 11/28/2023, 9:58 AM
How to Design a Reinforcement Learning Reward Function for a Lunar... https://towardsdatascience.com/how-to-design-reinforcement-learning-...

11 of 11 11/28/2023, 9:58 AM

You might also like