Deep Deterministic Policy Gradient
— DDPG —History
PU aa ON
Nupamiliclsh [eeML methods
Unsupervised iforcement
[eae] aura)
+ We have training set + Wehave a set of + Wedonot have a
for which we have unlabeled data and target variable
given right answer for learning algorithm
‘every training algorithm + We have reward
+ Job —to find structure signals
+ Training example in data with algorithms
contains all the right + Agent needs to plan
answers the path on its own to:
reach the goal where
+ Job —to replicate the reward exist
right answersReinforcement Learning
robotics
playinggemes
self-driving carsSupervised vs Unsupervised
ered Reciaint
Supervised
+ Clustering
~ Kemeans
+ Dimensionality reduction
~ Principal Component
Analysis
- svoSupervised process
X1, X2
Build Model
¥ {(X1, X2) =¥
X1, X2
b
New Data
Predict
Use ModelSupervised uses
Credit Card Fraud Logistic Regression Classification Example Car Insurance Fraud Regression Example
‘AntFtaud = intercept + coal eaimedArmtUnsupervised
Contains patterns Finds patterns Recognizes patterns ‘ a f
a i ! fi
if Cc > KJ "hit
Customer purchase Train Algorithm Build Model Customer Groups
data
New Customer Use Model
Purchase Data Similar Customer GroupUnsupervised
I si politics
| ot. |