You are on page 1of 3

Reinforcement Learning

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to


maximize reward in a particular situation. It is employed by various software and machines
to find the best possible behaviour or path it should take in a specific situation.
Reinforcement learning differs from the supervised learning in a way that in supervised
learning the training data has the answer key with it so the model is trained with the correct
answer itself whereas in reinforcement learning, there is no answer but the reinforcement
agent decides what to do to perform the given task. In the absence of training dataset, it is
bound to learn from its experience.

Main points in Reinforcement learning –


 Input: The input should be an initial state from which the model will start
 Output: There are many possible outputs as there are variety of solution to a
particular problem
 Training: The training is based upon the input; the model will return a state and
the user will decide to reward or punish the model based on its output.
 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.

REINFORCEMENT LEARNING SUPERVISED LEARNING

Reinforcement learning is all about making


decisions sequentially. In simple words we can In Supervised learning the
say that the output depends on the state of the decision is made on the
current input and the next input depends on the initial input or the input
output of the previous input given at the start

Supervised learning the


In Reinforcement learning decision is dependent, decisions are independent
so we give labels to sequences of dependent of each other so labels are
decisions given to each decision.

Example: Object
Example: Chess game recognition

P v/s NP
P: polynomial-time type problems, relatively easy to solve
NP: non-deterministic polynomial-time type problems, really hard to solve (E.g. - Factoring).
The only nice thing about NP type problems is that if I give you the answer, you can check it
very quickly. You might not be able to factor a 100-digit number but if you knew the two
factors, then you can check by multiplying and say if they give you the original number or not
(Hard to solve, easy to check).
Are NP type problems fundamentally different to P type problems? Or that some of the NP
type problems could become P type problems?
The problem increases in difficulty in a non-polynomial way, that’s why NP. NP problems
have the property that if you claim to know the answer, it can be easily checked. What makes the
problem one step higher or NP-Hard is that one is not able to find a solution that a computer can
easily check. Factoring is truly NP because so you can always check by multiplying the two
numbers.
What does "hard to solve" mean? The time a certain process takes is incredibly important to
computer efficiency. A problem is difficult if it takes a long time to solve, while easy ones can be
solved quickly.
What "polynomial" and "non-deterministic polynomial" mean is simply whether the time to solve
the question is something like x^3 (P) or x^n (NP). As you can tell, the polynomial in P problems
is a constant for that question - it's always 3 or 4 or some other real number. NP problems,
however, have changing polynomials as in the x^n example, making it non-deterministic aka
variable.
NP - Complete questions are those that if we can find a way to get one into polynomial time,
whether it be through a trick or new math, then ALL NP problems can be reduced to P. The
whole punching out shapes is considered NPC. The reason NPC problems are the core of this
question is because every single NPC problem you can conceive can be reduced to this type of
question. It's known as a traveling salesman problem, where one is trying to find the most
efficient set of decisions. All NPC problems are, at their core, a traveling salesman problem, and
just for lay-man's, it is the question of given a salesman is traveling around a state and wants to
hit every city only once, what is the most efficient path. This is one of those famous million dollar
questions, which some university is offering one million dollars for the proof that P=NP. Any time
one sees a million dollar reward for a question, realize that it is essentially that scientific
community is saying, "Yeah, we're pretty sure this isn't true; we can't prove it, but given the hints
and evidence we have, we're willing to bet one million dollars that nobody will prove this true."

P ≠ NP actually means P ⊂ NP, since all P problems are (obviously) in NP. We know perfectly
well how to solve "some" of NP problems in polynomial time, we don't know if we can do that for
"all" of them. What the video is actually referring to is NP-complete problems, NP problems that
we don't know how to solve in P.
When you are unable to find polynomial time algorithms to exponential time problems, we try to
show similarities so that if we are able to solve one problem, then we can say we can solve rest
of the similar problems. We want to do research work on only one of the problems.
1. So, either solve it or at least relate it.
2. If you are not able to write polynomial time deterministic algorithms for problems, why
don’t you write polynomial time non-deterministic algorithms for them.
choice(), success() and failure() are non-deterministic statements assuming that they take
O(1) time. The algorithm in overall takes O(1) time. The choice() function directly gives me
the index of the key present but I don’t know how that is happening.
So,
P – deterministic polynomial time whereas NP – non-deterministic polynomial time (for
exponential time problems). P is a subset of NP. Deterministic algorithms that we know
today were non-deterministic before.
P is a class of problems that basically includes all the problems that can be solved by a
reasonably fast program, like multiplication or sorting. And then around and including p, we
sort of discovered a class called NP – that is sort of problems that if you are given a correct
solution, you can at least check it in reasonably amount of time. People started wondering
whether all problems in NP would turn around to be P or there are some problems in NP that
are truly harder than the ones in P (P=NP). That’s P v/s NP.

Outside of NP are problems which are even hard to check an answer – like Chess, unlike
Sudoku which is a NP problem.
Sudoku is hard but not that tough, so what’s the big deal. We are really talking about how
the difficulty scales up as you make the problem bigger and bigger. As you make the
problem bigger and bigger, the problem gets really hard, rapidly getting out of reach for even
the most powerful computers.
The whole area of problems that are at least as hard as NP – Complete problems are called
NP – Hard.

 Forbenius norm is euclidean norm for a matrix.


 2-norm is euclidean norm for a vector.

There are three important types of matrix norms. For some matrix A
 Induced norm, which measures what is the maximum of ‖Ax‖‖x‖ for any x≠0 (or,
equivalently, the maximum of ∥Ax∥ for ∥x∥=1).
 Element-wise norm, which is like unwrapping A into a long vector, then calculating its
vector norm.
 Schatten norm, which measures the vector norm of the singular values of A.

You might also like