Welcome to Scribd!

Reinforcement Learning

Uploaded by

0% found this document useful (0 votes)

11 views8 pages

This document discusses reinforcement learning techniques for solving multi-armed bandit problems and the exploration-exploitation dilemma. It covers applications like identifying the best performing online ad or strategy with the highest customer conversion rate. Algorithms discussed include epsilon-greedy, upper confidence bounds, Thompson sampling, and Q-learning. Q-learning is applied to an example of teaching an AI to navigate a warehouse by rewarding it for finding the optimal path between points.

Original Description:

Original Title

111_Sarbani_Mishra_02-08-2021

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

11 views8 pages

Reinforcement Learning

Uploaded by

Sarbani Mishra

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 8

Search inside document

Reinforcement Learning

Sarbani Mishra

PGP/24/111

Multi-armed bandits :

Learnt exploration vs exploitation dilemma through a classic case. It about creating a game of
chance.

It’s the case about the principal architecture on which slot machines of Casino operates.

By creating some virtual slot machines we will try to maximize the total reward by identifying the
luckiest machine

>Importing numpy as “np”

And defining a class for a single slot machine that gives a reward from a normal (Gaussian)
distribution.

>Class Gaussianbanditgame(object) :

Then we tried to guide the algorithm to have a random shuffle in between

>Np.random.shuffle(self.bandits)
A particular reward could be wildly different from the average reward we could expect from that
machine. This was dependent on the variance of the reward distribution.

Online Ads :

One more application of MAB model. Here we want to exploit the MAB model to identify the Best
CTR (click through rate). Reward will come from the different Bernoulli distribution for each ad.

Adding Bernoulli distribution.

>class_BernouliBandit(Object):

- Starting A/B/n testing with 5 ads with different Bernoulli numbers encoded which created
randomness

We are looking for the average reward, that is calculated by the algorithm and shown by the print
command. Which in our case was “Ad D”.
Plotting the curve using Cufflink 0.17.3 (pip Install cufflinks)

We can see that the average reward went to 2.7% which is expected CTR for the ad D.

Use of Epsilon-Greedy Algorithm to understand exploration- exploitation problem :

The Epsilon-Greedy algorithm helps to improve the CTR rate.

>greedy_list = [ ‘e-greedy: 0.1]

This algorithm run the model through the 10K impressions and predict the rate .But this can be
improved because the model doesn’t take any preventive it to writeoff actions that are clearly
failing. The CTR Rate is 2.98%.
Upper Confidence Bounds:

With this algorithm the CTR rate improved to 3.8%. This algorithm knows when to stop the
exploration and start exploitation. It systematically and dynamically allocates the budget to
alternatives that need exploration.

Thomson sampling

Thompson sampling is a Heuristic for choosing actions that address the exploration-exploiatation
dilemma in the multi-arm Bandit problem.
After creating a dataset. It takes the best slot machine through beta distribution and update its
losses and wins .

Thomson sampling for customer conversion via Advertisements :

Using The model to figure out which strategy has the highest conversion rate quickly by spending the
minimum amount.

Customers got popup ad, in suggestion we can provide the same content as the ad is describing.

Building the environment under the simulation.

Computing the relative return by each strategy and plotting the histogram of the relative return
against their respective strategy.

(Q learning) Finding a path through a warehouse using positive reward system

First we need to create an environment by defining the rewards :

Building the AI solution with the q learning and implementing it :

The same reinforcement Q-Learning algorithm can be implemented in teaching an AI to follow or

find the goal in the labyrinth. This can be done by giving incentive to the algorithm in order to teach
them how to navigate.

After the algorithm is learned and executed the algorithm can sent for the production.
And in the actual life the robot will navigate through the points in order to cover the route. We can
define the path with the incremental reward to plot the course of the robots in the warehouse.

Most sensible path : ['E', 'I', 'J', 'F', 'B', 'C', 'G']

Class Exercise MAB Demo, Thompson Sampling, RL - Sell Like A Wolf, Q-Learning
Document2 pages
Class Exercise MAB Demo, Thompson Sampling, RL - Sell Like A Wolf, Q-Learning
Soubhagya Dash
No ratings yet
Optimal Auction
Document7 pages
Optimal Auction
kkiiidd
No ratings yet
Monte Carlo Simulation
Document3 pages
Monte Carlo Simulation
Luna Yesha
No ratings yet
10 Techniques To Deal With Class Imbalance in Machine Learning
Document10 pages
10 Techniques To Deal With Class Imbalance in Machine Learning
CHLIAH HANANE
No ratings yet
Multi Type of Capacitated Vehicle Routing Problem With A Genetic Algorithm (GA) and Deap Library in Python. by Marouane Najid Medium
Document24 pages
Multi Type of Capacitated Vehicle Routing Problem With A Genetic Algorithm (GA) and Deap Library in Python. by Marouane Najid Medium
Rony Rock
No ratings yet
Week-4 Supply Chain Optimization - UOC
Document6 pages
Week-4 Supply Chain Optimization - UOC
MK Khan
No ratings yet
Dissecting Reinforcement Learning-Part6
Document25 pages
Dissecting Reinforcement Learning-Part6
Sep Electromecanica
No ratings yet
Waterfall Bandits: Learning To Sell Ads Online
Document21 pages
Waterfall Bandits: Learning To Sell Ads Online
Sanjay Das
No ratings yet
To Simulate Monte Carlo Techniques
Document13 pages
To Simulate Monte Carlo Techniques
Vineet Kumar
No ratings yet
Reinforcement Learning - Chapter 2
Document22 pages
Reinforcement Learning - Chapter 2
Sivasathiya G
100% (1)
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
Rating: 2.5 out of 5 stars
2.5/5 (2)
P1.T2. Quantitative Analysis Bionic Turtle FRM Practice Questions Chapter 13: Simulation and Bootstrapping
Document24 pages
P1.T2. Quantitative Analysis Bionic Turtle FRM Practice Questions Chapter 13: Simulation and Bootstrapping
Christian Rey Magtibay
No ratings yet
Grokking Machine Learning
From Everand
Grokking Machine Learning
Luis Serrano
No ratings yet
Brute Force Search: Fundamentals and Applications
From Everand
Brute Force Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Module 3.4 Classification Models, Case Study
Document12 pages
Module 3.4 Classification Models, Case Study
Duane Eugenio Ani
No ratings yet
Uber Ridesharing Clustering: Notebook
Document14 pages
Uber Ridesharing Clustering: Notebook
naresh tinnaluri
No ratings yet
05 Focs Aa
Document10 pages
05 Focs Aa
Eduard
No ratings yet
Simulation Modeling Workshop
Document33 pages
Simulation Modeling Workshop
WadeReynes
No ratings yet
Bid Dynamics
Document10 pages
Bid Dynamics
MP
No ratings yet
Online Ad Auctions: UC Berkeley and Google. Hal@ischool - Berkeley.edu 1
Document6 pages
Online Ad Auctions: UC Berkeley and Google. Hal@ischool - Berkeley.edu 1
Salman Sufi
No ratings yet
When Prospect Theory Meets Optimal Pricing
Document61 pages
When Prospect Theory Meets Optimal Pricing
Pedro Ari Crisostomo Lima
No ratings yet
Lecture 2.3.4GAN
Document4 pages
Lecture 2.3.4GAN
Mohd Yusuf
No ratings yet
Generative Adversarial Networks (GANs) - Engine and Applications PDF
Document13 pages
Generative Adversarial Networks (GANs) - Engine and Applications PDF
Khaja Riazuddin Nawaz Mohammed
No ratings yet
The Civil War Campaign Editor 2
Document17 pages
The Civil War Campaign Editor 2
Haggard72
No ratings yet
Essentials of Machine Learning Algorithms
Document15 pages
Essentials of Machine Learning Algorithms
Andres Valencia
No ratings yet
Monte Carlo Simulation
Document4 pages
Monte Carlo Simulation
Victor Anaya
No ratings yet
Ow O Ose LL Our Oney Rading: A Look at The "Trading System Con Game" & How To Avoid Getting Fleeced
Document30 pages
Ow O Ose LL Our Oney Rading: A Look at The "Trading System Con Game" & How To Avoid Getting Fleeced
Glen Carlile
No ratings yet
Affiliate Bully Autoresponder Madness
Document72 pages
Affiliate Bully Autoresponder Madness
Lucas Cepeda
100% (2)
Improving Algo Trading by Monte Carlo Simulation
Document17 pages
Improving Algo Trading by Monte Carlo Simulation
Peter Samual
No ratings yet
Provable Fairness: 1.1 Significance
Document18 pages
Provable Fairness: 1.1 Significance
ahmadkamal
No ratings yet
Machine Learning Tutorial For Beginners
Document15 pages
Machine Learning Tutorial For Beginners
manoranjanchoudhury
No ratings yet
State Space Search: Fundamentals and Applications
From Everand
State Space Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
App
Document4 pages
App
NIRANJAN RAJANDRAN
No ratings yet
Crush Price Objections: Sales Tactics for Holding Your Ground and Protecting Your Profit
From Everand
Crush Price Objections: Sales Tactics for Holding Your Ground and Protecting Your Profit
Tom Reilly
No ratings yet
Chapter 9: Monopoly: Learning Objectives
Document2 pages
Chapter 9: Monopoly: Learning Objectives
theplumber
No ratings yet
Monte Carlo
Document16 pages
Monte Carlo
xxx
No ratings yet
Chapter Five: Machine Learning
Document24 pages
Chapter Five: Machine Learning
he qeto
No ratings yet
Samuel Murphy Case Study Firms and Markets
Document21 pages
Samuel Murphy Case Study Firms and Markets
Ngọc Ánh
100% (1)
Data Mining Project DSBA Clustering Report Final
Document26 pages
Data Mining Project DSBA Clustering Report Final
srashti tripathi
No ratings yet
Generative Adversarial Networks (Gans) : Date: 14.11.2022
Document12 pages
Generative Adversarial Networks (Gans) : Date: 14.11.2022
Niteesh Krishna
No ratings yet
Assignment 1 (Ques)
Document6 pages
Assignment 1 (Ques)
Rahat
No ratings yet
Python
Document12 pages
Python
Dhinesh kumar
No ratings yet
Machine Learning Chapter3
Document27 pages
Machine Learning Chapter3
Nesma Abdellatif
No ratings yet
Monte Carlo Risksim
Document18 pages
Monte Carlo Risksim
api-3774614
No ratings yet
How To Use Monte Carlo Simulation With GBM
Document10 pages
How To Use Monte Carlo Simulation With GBM
selozok1
No ratings yet
Question
Document20 pages
Question
PankulDhamanda
No ratings yet
Competitive Analysis: Steven Skiena
Document17 pages
Competitive Analysis: Steven Skiena
Aman Machra
No ratings yet
Optimizing Advertiser Utility in Real Time Bidding: Jim Caine November, 2014 Depaul University - CSC 594
Document13 pages
Optimizing Advertiser Utility in Real Time Bidding: Jim Caine November, 2014 Depaul University - CSC 594
astro
No ratings yet
Ride The Triple Plays... Then GET OUT!
Document10 pages
Ride The Triple Plays... Then GET OUT!
jesssamuel
No ratings yet
Assignment - 13M Payment Behaviour Prediction - BAA
Document1 page
Assignment - 13M Payment Behaviour Prediction - BAA
Aditya
No ratings yet
Commonly Used Machine Learning Algorithms
Document38 pages
Commonly Used Machine Learning Algorithms
ashokmvanjare
No ratings yet
The Cheapest Shop Seeker: A New Algorithm For Optimization Problem in A Continous Space
Document8 pages
The Cheapest Shop Seeker: A New Algorithm For Optimization Problem in A Continous Space
IAES IJAI
No ratings yet
Unit 1
Document15 pages
Unit 1
Yash Gandharv
No ratings yet
What Is The Monte Carlo Simulation?
Document19 pages
What Is The Monte Carlo Simulation?
Gæmë Mäñ Yådé
No ratings yet
E Commerce Fraud Detection Report
Document7 pages
E Commerce Fraud Detection Report
Gopi Chand
No ratings yet
Mechanism Design: Fundamentals and Applications
From Everand
Mechanism Design: Fundamentals and Applications
Fouad Sabry
No ratings yet
Tesla Stock Marketing Price Prediction
Document62 pages
Tesla Stock Marketing Price Prediction
syedhaji1996
No ratings yet
SSRN-id2227333 - Give Me Some Credit
Document35 pages
SSRN-id2227333 - Give Me Some Credit
Sempu
No ratings yet
Simul 8
Document22 pages
Simul 8
alexdrift
No ratings yet
GT Basic Practise Set
Document5 pages
GT Basic Practise Set
hemanthillipilli
No ratings yet
Air France: Internet Marketing: Group 2
Document8 pages
Air France: Internet Marketing: Group 2
Sarbani Mishra
No ratings yet
Air France Internet Marketing: By, Group 8 Amol Tambe - Atika Lamba - Kanika Khanna - Mayur Ghude - Shekhar Suman
Document8 pages
Air France Internet Marketing: By, Group 8 Amol Tambe - Atika Lamba - Kanika Khanna - Mayur Ghude - Shekhar Suman
Sarbani Mishra
No ratings yet
Air France Internet Marketing: Group Number-10
Document5 pages
Air France Internet Marketing: Group Number-10
Sarbani Mishra
No ratings yet
Group 9 Sellars Market
Document3 pages
Group 9 Sellars Market
Sarbani Mishra
No ratings yet
Customer Persona - GRP 7
Document2 pages
Customer Persona - GRP 7
Sarbani Mishra
No ratings yet
Air France Internet Marketing:: Submitted by
Document5 pages
Air France Internet Marketing:: Submitted by
Sarbani Mishra
No ratings yet
This Study Resource Was: IMC CASE: Gillette: Dry Idea (A) File
Document5 pages
This Study Resource Was: IMC CASE: Gillette: Dry Idea (A) File
Sarbani Mishra
No ratings yet
Energy Saving Electric Vehicle Pitch Deck by Slidesgo
Document70 pages
Energy Saving Electric Vehicle Pitch Deck by Slidesgo
Sarbani Mishra
No ratings yet
Mcdonald'S: Can A Behemoth Lead in The Era of Artificial Intelligence?
Document7 pages
Mcdonald'S: Can A Behemoth Lead in The Era of Artificial Intelligence?
Sarbani Mishra
No ratings yet
Sellar's Market: by Group 7 RM-A
Document6 pages
Sellar's Market: by Group 7 RM-A
Sarbani Mishra
No ratings yet
Load Carrying Electric Vehicle: Market Analysis
Document32 pages
Load Carrying Electric Vehicle: Market Analysis
Sarbani Mishra
No ratings yet
Artificial Intelligence in Business - Secb: Sarbani Mishra Pgp/24/111
Document4 pages
Artificial Intelligence in Business - Secb: Sarbani Mishra Pgp/24/111
Sarbani Mishra
No ratings yet
Lohia Narain Cargo Brochure
Document2 pages
Lohia Narain Cargo Brochure
Sarbani Mishra
No ratings yet
Final Project Report: Artificial Intelligence For Business Ai-Loom
Document23 pages
Final Project Report: Artificial Intelligence For Business Ai-Loom
Sarbani Mishra
No ratings yet
PGP 2 Case Study
Document3 pages
PGP 2 Case Study
Sarbani Mishra
No ratings yet
Helpful Hints:: Nestlé Refrigerated Foods (A) :contadina Pasta & Pizza
Document1 page
Helpful Hints:: Nestlé Refrigerated Foods (A) :contadina Pasta & Pizza
Sarbani Mishra
No ratings yet
Decision Tree Explanation
Document13 pages
Decision Tree Explanation
Sarbani Mishra
No ratings yet
Launching New Products and Services Professor Michal Maimaran Fall 2016
Document4 pages
Launching New Products and Services Professor Michal Maimaran Fall 2016
Sarbani Mishra
No ratings yet
Cluster Analysis On PCA On Wholesale Customers Data
Document6 pages
Cluster Analysis On PCA On Wholesale Customers Data
Sarbani Mishra
No ratings yet
GRP 5 - AIB SecB - Final Report
Document24 pages
GRP 5 - AIB SecB - Final Report
Sarbani Mishra
No ratings yet
CA Pintura Case
Document11 pages
CA Pintura Case
Sarbani Mishra
No ratings yet
CASE Analysis
Document16 pages
CASE Analysis
Sarbani Mishra
No ratings yet
CASE Analysis
Document16 pages
CASE Analysis
Sarbani Mishra
No ratings yet
Case 5: A Chairman's Decision: Launching A Robo-Advisor in CCB Principal Asset Management Company
Document4 pages
Case 5: A Chairman's Decision: Launching A Robo-Advisor in CCB Principal Asset Management Company
Sarbani Mishra
No ratings yet
15 - Sessional QP
Document2 pages
15 - Sessional QP
Jaideep Sharma
No ratings yet
Sample Paper3
Document9 pages
Sample Paper3
satyam1402
No ratings yet
Imx 8 Mncec
Document114 pages
Imx 8 Mncec
Noel Liarte
No ratings yet
The Role of Cryptography in Cryptocurrency
Document6 pages
The Role of Cryptography in Cryptocurrency
nadapana pranav
No ratings yet
Data Structure and Algorithm Module
Document89 pages
Data Structure and Algorithm Module
Dani Getachew
No ratings yet
Protel Fo Training - 2
Document13 pages
Protel Fo Training - 2
E-com TheAnam
No ratings yet
Python Datatypes
Document9 pages
Python Datatypes
Kamalakumar V
No ratings yet
Lab4 Djm30073-Ring Counters
Document4 pages
Lab4 Djm30073-Ring Counters
Syakir Aieman
No ratings yet
Unit-1 Introduction To ASP - NET Web Programming - IDE-Attachment-1 - 02-04-2021-08-40-01
Document35 pages
Unit-1 Introduction To ASP - NET Web Programming - IDE-Attachment-1 - 02-04-2021-08-40-01
Manav
No ratings yet
Z-Stack OTA Upgrade User's Guide
Document46 pages
Z-Stack OTA Upgrade User's Guide
Thanh nha
No ratings yet
Cognito DG
Document357 pages
Cognito DG
Nguyen Minh Nguyet
No ratings yet
Object-Oriented Analysis and Design
Document30 pages
Object-Oriented Analysis and Design
மனிதம்
No ratings yet
Wiley - Wooldridge, An Introduction To Multi Agent Systems (OCR Guaranteed On Full Book)
Document365 pages
Wiley - Wooldridge, An Introduction To Multi Agent Systems (OCR Guaranteed On Full Book)
gannergui
No ratings yet
ZD620™ Printer Specifications: Standard Features
Document5 pages
ZD620™ Printer Specifications: Standard Features
S A Zaman
No ratings yet
Functions
Document11 pages
Functions
raj
No ratings yet
MFRA
Document54 pages
MFRA
zuraiz.zubair
No ratings yet
Unit 5
Document60 pages
Unit 5
test
No ratings yet
TFM 94 Series
Document2 pages
TFM 94 Series
luat1983
No ratings yet
C2002947 Thytronic PRON NA 21 IEC 103 Attestation of Conformity 22 02 25056091f2
Document1 page
C2002947 Thytronic PRON NA 21 IEC 103 Attestation of Conformity 22 02 25056091f2
İsmail Okuyan (ORION)
No ratings yet
Speeduino Manual
Document136 pages
Speeduino Manual
Sunarno Redhat
No ratings yet
9 - Django Rest Framework
Document40 pages
9 - Django Rest Framework
Arun Bansal
No ratings yet
Chapter 8 Securing Information System
Document35 pages
Chapter 8 Securing Information System
jackie cramer
No ratings yet
USB 3.2 Revision 1.0
Document548 pages
USB 3.2 Revision 1.0
sonal
No ratings yet
Dell Emc Networking-S5200 On Spec Sheet
Document6 pages
Dell Emc Networking-S5200 On Spec Sheet
Jose L. Rodriguez
No ratings yet
Ec Bos 8 - SP
Document5 pages
Ec Bos 8 - SP
General Electronic Facility Landside
No ratings yet
openSAP Hana2ql2 Week 2 Transcript EN
Document20 pages
openSAP Hana2ql2 Week 2 Transcript EN
Alexander Filatov
No ratings yet
BS en 50132-5!1!2011-Alarm Systems-CCTV Surveillance Systems
Document66 pages
BS en 50132-5!1!2011-Alarm Systems-CCTV Surveillance Systems
Simoncarter Law
100% (2)
IBM Aspera Enterprise
Document4 pages
IBM Aspera Enterprise
rberrospi
No ratings yet
CooVox-U50 Datasheet
Document2 pages
CooVox-U50 Datasheet
Abdelhamid Ait mohand
No ratings yet
Port Numbers - Deep Security
Document6 pages
Port Numbers - Deep Security
repotec
No ratings yet