FYP PROP-ver1.1

Balochistan University of Information Technology, Engineering and Management
Sciences, Quetta
BUITEMS Quality & Excellence in Education
Final Year Approval Form, FICT
Project Title: EXPLORATION OF AN UNKNOWN ENVIRONMENT USING DEEP REINFORCEMENT
LEARNING AND INTRINSIC REWARDS
*
All information in this form should be typed. Date: 23 / 06 / 2023
1. Project Title: Exploration of an unknown environment using Hardware Based

Deep Reinforcement Learning and Intrinsic Rewards. Software Based
2. Supervisor: Dr. Anayat Ullah Baloch Both Hardware and Software Based
3. Group Members
Sr. No. Name in Full Department CMS ID
(Use Block Letters) (e.g., BSCS - Fall 2008)
1. DANISH ALI BSEE – FALL 2020 52764
2. SHAHRUKH HUSSAIN BSEE – FALL 2020 52765
3. SYED MUHAMMAD HASSAN MUJTABA BSEE – FALL 2020 54960
4(A). List of the software &/ 4(B). List of the hardware to be 4(C). Deliverables upon completion of
programming languages to be used the project
used
1. ROS (Robot Operating 1. Depth Camera FYP-I
System) 2. Chassis 1. Literature Review
2. Gazebo Sim 3. Motors x 2 2. Creation of simulation environment
3. Python 4. Wheel x 2 3. Interfacing of Depth Camera
4. Reinforcement Learning 5. Caster Wheel 4. Optimization and deployment of
Framework 6. Arduino algorithms – I
7. SBC (Single-Board Computer)
FYP-II
1. Optimization and deployment of
algorithms - II
2. Testing the performance of algorithms
against test environment
3. Hardware implementation.
5. Project Introduction (brief / single paragraph with 50-100 words)

This project focuses on developing a mapless navigation system for unknown environments, utilizing a
depth camera and reinforcement learning. The related existing literature relies on pre-existing maps and use
value-based algorithms such as (Q-Learning, DQN, DDQN etc.), along with LiDAR for sensing the
environment. However, value-based algorithms have a disadvantage of having a discrete-action space, as
well as LiDAR is size and cost inefficient for small mobile bots. To overcome these limitations, the project
uses depth camera, which comparatively provides detailed information about the texture and shapes in the
environment. On the other hand, the actor-critic algorithms involving (A3C, PPO and A2C etc.) enabling
the agent to take continuous actions such as moving with varying velocities and changing its direction in a
continuous way. By integrating depth perception, and actor-critic based reinforcement learning algorithms,
along with curiosity-inspired rewards, we aim the robot to learn and adapt navigation strategies in real-time.
6. Problem Statement (brief / single paragraph with up to 50 words)
Recent research explores the benefits of autonomous navigation in unfamiliar terrain by employing LiDAR
and value-based algorithms. However, these algorithms face challenges in dynamic scenarios due to their
reliance on predefined values to guide decision-making, leading in increased inefficiency. Moreover, while
LiDAR estimates accurate distances, it falls short in assessing the environment comprehensively. To address
this limitation, our approach incorporates a depth camera, which can provide valuable information about
Page 1 of 5
Sciences, Quetta
the texture and distance of the surroundings. Furthermore, in dynamic environments, actor-critic algorithms
outperform value-based algorithms as they continuously update and employ their policies to drive ongoing
actions. The depth and policy-based learning employed by the actor-critic method enable more precise and
efficient navigation in an unknown environment.
7. Objectives to be achieved (at least three)
1. Testing and Evaluation of various actor-critic based Reinforcement Learning (RL) algorithms for autonomous
navigation in different environments.
2. Defining a reward structure using an intrinsic curiosity module.
3. Evaluating the performance of trained model on hardware.
8. Methodology (may contain description, flowcharts, figures, diagrams etc.)
Starting with in-depth review of relevant literature on reinforcement learning algorithms for mapless
navigation tasks using different algorithms. These algorithms include value and policy-based learning while
considering their performance according to our application.
For implementation Gazebo is used to create simulation environment in order to deploy the agent interfaced
with the Depth camera outputting RGB image data and depth or distance information. This data will be the
current state of the agent. The agent will choose a random action and will try to predict the consequences
of its own action in the form of next state. The agent will perform the action and receive the actual next
state. The difference between the predicted and actual next state will be the curiosity reward. The higher the
difference, the higher the reward and hence forcing the agent to explore the unknown environment more.
Unlike the value-based algorithms which only allowed the agent to take discrete actions such as move-
forward, move left or right, here the agent can move and rotate by continuous values. Moreover, extrinsic
rewards can further assist in learning of the agent. The following illustration elaborates the process.
Figure 1: Illustration of the agent getting reward based on action.
In this process the agent learns from the actions it takes based on the generated reward to achieve the
navigation or mapping of unknown environment. For this purpose, different algorithms can be fine-tuned
to enhance the performance of the trained models in terms of navigation efficiency, adaptability, and
robustness. The algorithms will be tested in a test environment and observe the efficiency of each algorithm.
Finally, the efficient algorithm will be deployed on a physical differential drive robot to evaluate accurate
navigation of unknown environment without any predefined maps.
The results will discuss the strengths and weaknesses of the algorithms compared to each other and
observing the feasible algorithm performance in different test environments as well as its performance in
real time.
Page 2 of 5
Sciences, Quetta
9. Project Applications (at least two)

1. Exploration of Hazardous and Unreachable Environments: Areas that exhibit harmful radiations, dynamic
environments and regions that are out of reach like deep oceans, remote mountain ranges and dense jungles can
be explored by drones and mobile robots equipped with maples navigation.
2. Mapping of unexplored territories: There are a lot of undiscovered terrains currently which may contain
different resources, again small aerial bots and ground bots can be deployed in such areas to explore on their own.
3. Search and Rescue missions: In emergency situations such as disasters, this technique can boost the rescue time
by providing a complete explored map of the region.
4. Indoor mapping: Mapless navigation will provide a complete detailed map of an indoor environment that can be
utilized by service bots to guide humans in large shopping malls, airports, offices and educational institutions.
10. SDG Goal Mapping (for 7th and 8th Semester)
1. Industry, Innovation and Infrastructure (SDG - 9): These Robots can be used in the industry for moving and
placing objects accurately while planning their maps by themselves.
11. Gantt Chart (for 7th and 8th Semester)
TASKS PROGRESS
Sep-23 Oct-23 Nov-23 Dec-23 Jan-24 Feb-24 Mar-24 Apr-24 May-24 Jun-24 Jul-24
Literature related courses

Literature Review
Studying the algorithms
Creation of Simulation Environment in Gazebo
Study and Interface Depth Camera in Gazebo
Coding of the algorithms
Extraction and manipulation of data from Depth Camera
Training of the algorithms
Testing the algorithms in different test environments
Deploying the trained model on hardware
Generating comparative results of their performance
Evaluation of Results of the algorithms
Documentation of thesis
Comparing the simulated results with result of hardware…
Printing and Submitting the Documentation
12. Members Details

Sr. Name in Full
CMS ID Email Contact No. Signatures
No (Use Block Letters)
ali.danish3401@gmail.co
1. 52764 DANISH ALI 03169833149
m
SHAHRUKH shahrukhh442@gmail.co
2. 52765 03483987300
HUSSAIN m
SYED M. HASSAN
3. 54960 smujtaba861@gmail.com 03482628134
MUJTABA
Page 3 of 5
Sciences, Quetta
13. Supervisor’s Consent

I, Dr. Anayat Ullah Baloch , am willing to guide these students in all phases of above-mentioned project
as advisor. I have carefully seen the Title and description of the project and believe that it is of an appropriate
difficulty level for the number of students named above.
Supervisor E-mail Address: anayat.ullah@buitms.edu.pk
Supervisor Signatures and Date: ________________________________________
14. Co-Supervisor’s Consent (If Any)
Co-Supervisor’s Name and Signature: __________________________________________________

Note: Advisor can’t be changed without prior permission of the Supervisor and FYP coordinator
Page 4 of 5
Sciences, Quetta
For Official Use Only

15. Approval Committee Remarks Date: / /
Committee Member Approved / Rejected /
Remarks Signatures
Name Changes
16. FYP Coordinator Remarks
17. FYP Coordinator - Final Decision
Approved
Rejected
Approved with subject to modification
18. FYP Coordinator’s Signature and Date
Page 5 of 5

FYP PROP-ver1.1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FYP PROP-ver1.1

Uploaded by

Copyright:

Available Formats

Balochistan University of Information Technology, Engineering and Management

1. Project Title: Exploration of an unknown environment using Hardware Based

5. Project Introduction (brief / single paragraph with 50-100 words)

Figure 1: Illustration of the agent getting reward based on action.

9. Project Applications (at least two)

Literature related courses

12. Members Details

13. Supervisor’s Consent

Supervisor Signatures and Date: ________________________________________

14. Co-Supervisor’s Consent (If Any)

Co-Supervisor’s Name and Signature: __________________________________________________

For Official Use Only

16. FYP Coordinator Remarks

17. FYP Coordinator - Final Decision

18. FYP Coordinator’s Signature and Date

You might also like